You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/03/16 07:43:32 UTC

[GitHub] spark pull request: Parse modes in JSON data source

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/11756

    Parse modes in JSON data source

    ## What changes were proposed in this pull request?
    
    Currently, there is no way to control the behaviour when fails to parse corrupt records in JSON data source .
    
    This PR adds the support for parse modes just like CSV data source. There are three modes below:
    
    - `PERMISSIVE` :  When it fails to parse, this sets `null` to to field. 
    - `DROPMALFORMED`: When it fails to parse, this drops the whole record.
    - `FAILFAST`: When it fails to parse, it just throws an exception.
    
    This PR also make JSON data source share the `ParseModes` in CSV data source.
    
    ## How was this patch tested?
    
    Unit tests were used and `./dev/run_tests` for code style tests.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-13764

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11756.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11756
    
----
commit 4c46f4b97b062cba06d5b9bb7987b3606b0dd4dc
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-03-16T06:34:29Z

    Parse modes in JSON data source

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197665908
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53382/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56299195
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +963,31 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    +    withSQLConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD.key -> "_unparsed") {
    +      // `FAILFAST` mode should throw an exception for corrupt records.
    +      val exception = intercept[SparkException] {
    +        sqlContext.read.option("mode", "FAILFAST").json(corruptRecords).collect()
    +      }
    +      assert(exception.getMessage.contains("Malformed line in FAILFAST mode: {"))
    +
    +      // `DROPMALFORMED` mode should skip corrupt records
    +      // For `PERMISSIVE` mode, it is tested in "Corrupt records" test.
    +      val jsonDF = sqlContext.read.option("mode", "DROPMALFORMED").json(corruptRecords)
    +      val schema = StructType(
    +        StructField("_unparsed", StringType, true) ::
    +          StructField("a", StringType, true) ::
    +          StructField("b", StringType, true) ::
    +          StructField("c", StringType, true) :: Nil)
    +      assert(schema === jsonDF.schema)
    +
    +      checkAnswer(
    +        jsonDF,
    +        Row(null, "str_a_4", "str_b_4", "str_c_4") :: Nil
    --- End diff --
    
    I realised that It will has a `_corrupt_record` field as it fails to parse corrupt records. So, I thought it would be better just  to show `_unparsed` explicitly rather than using the defualt `_corrupt_record`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199143263
  
    LGTM, pending tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197652643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53379/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199157504
  
    **[Test build #53659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53659/consoleFull)** for PR 11756 at commit [`dec3d81`](https://github.com/apache/spark/commit/dec3d8167c862385a489b999feb7af6c03316cfa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56290903
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONOptions.scala ---
    @@ -49,6 +50,16 @@ private[sql] class JSONOptions(
       val allowBackslashEscapingAnyCharacter =
         parameters.get("allowBackslashEscapingAnyCharacter").map(_.toBoolean).getOrElse(false)
       val compressionCodec = parameters.get("compression").map(CompressionCodecs.getCodecClassName)
    +  private val parseMode = parameters.getOrElse("mode", "PERMISSIVE")
    +
    +  // Parse mode flags
    +  if (!ParseModes.isValidMode(parseMode)) {
    +    logWarning(s"$parseMode is not a valid parse mode. Using ${ParseModes.DEFAULT}.")
    --- End diff --
    
    https://github.com/apache/spark/pull/11756#discussion_r56290625, That warning should be dealt with here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197222768
  
    I'm not familiar with CSV part, what if users set the schema directly before read data and the mode is `PERMISSIVE`? Will we add the extra field?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198215626
  
    **[Test build #53506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53506/consoleFull)** for PR 11756 at commit [`59e7214`](https://github.com/apache/spark/commit/59e72142dd801fb5c8266785153498ae4c67d5e6).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197276405
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53312/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197207970
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197234856
  
    **[Test build #53303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53303/consoleFull)** for PR 11756 at commit [`3675fae`](https://github.com/apache/spark/commit/3675faee2450677d56720afb8fb744d7482a99dc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198212315
  
    **[Test build #53504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53504/consoleFull)** for PR 11756 at commit [`bfc0405`](https://github.com/apache/spark/commit/bfc04051cca8e88b5f4627560a4353b10b584416).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56784237
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +964,53 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    --- End diff --
    
    split it into 2 cases: `Corrupt records: FAILFAST mode` and `Corrupt records: DROPMALFORMED mode`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197224540
  
    For example, the data below:
    
    ```
    1,2,3,4
    3,2,1
    ```
    
    will produce the records below:
    
    - `PERMISSIVE` 
    ```
    Row(1,2,3,4)
    Row(3,2,1,null)
    ```
    
    - `PERMISSIVE` with user schema
    
    ```scala
    Schema("field1", "field2", "field3"")
    ```
    ```
    Row(1,2,3)
    Row(3,2,1)
    ```
    
    - `DROPMALFORMED`
    
    ```
    Row(1,2,3,4)
    ```
    - `DROPMALFORMED` with user schema
    
    ```scala
    Schema("field1", "field2", "field3"")
    ```
    ```
    Row(1,2,3)
    ```
    
    - `FAILFAST`
    
    Throws an exception with user schema
    
    - `FAILFAST`
    
    Throws an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199116308
  
    **[Test build #53652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53652/consoleFull)** for PR 11756 at commit [`3ff900e`](https://github.com/apache/spark/commit/3ff900ec904991e79bf6267c16ee38dfc15660be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197235349
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197628087
  
    This makes sense to me. Actually for CSV, when `any row does not have a same schema`, it just means `corrupted format`, as CSV has a very simple format and can always be parsed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197239116
  
    **[Test build #53312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53312/consoleFull)** for PR 11756 at commit [`4440a55`](https://github.com/apache/spark/commit/4440a5556309bdfeec52f00ef151897bcdb04336).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197667954
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53384/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197636051
  
    **[Test build #53384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53384/consoleFull)** for PR 11756 at commit [`551593a`](https://github.com/apache/spark/commit/551593a96edacc731f4e76e1fc3c2ec9327220f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56291887
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -991,6 +999,16 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
                 Row(null, null, null, "]") :: Nil
             )
     
    +        // Check if corrupt records are dropped.
    --- End diff --
    
    hmm, but the expected result is `Row("str_a_4", "str_b_4", "str_c_4", null)`, did I miss something here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Parse modes in JSON data source

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197182238
  
    **[Test build #53286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53286/consoleFull)** for PR 11756 at commit [`4c46f4b`](https://github.com/apache/spark/commit/4c46f4b97b062cba06d5b9bb7987b3606b0dd4dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197665701
  
    **[Test build #53382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53382/consoleFull)** for PR 11756 at commit [`29a8f68`](https://github.com/apache/spark/commit/29a8f68cc5b3ba686f56ffc664766920eb3c2824).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198215423
  
    **[Test build #53506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53506/consoleFull)** for PR 11756 at commit [`59e7214`](https://github.com/apache/spark/commit/59e72142dd801fb5c8266785153498ae4c67d5e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197208262
  
    **[Test build #53303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53303/consoleFull)** for PR 11756 at commit [`3675fae`](https://github.com/apache/spark/commit/3675faee2450677d56720afb8fb744d7482a99dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56291446
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -991,6 +999,16 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
                 Row(null, null, null, "]") :: Nil
             )
     
    +        // Check if corrupt records are dropped.
    --- End diff --
    
    It's not dropped, but set to null right? I'm not sure how to set JSON options in SQL string, but by default it's PERMISSIVE, so having null field looks reasonable for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197636844
  
    @cloud-fan Sorry, one more question. Would it be great if we maybe make `spark.sql.columnNameOfCorruptRecord` as an option just like the compression option for other data sources?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199137111
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53652/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197299127
  
    Overall LGTM, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199137110
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198215630
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11756


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197207972
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53286/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198214354
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197246235
  
    **[Test build #53313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53313/consoleFull)** for PR 11756 at commit [`32ae8b2`](https://github.com/apache/spark/commit/32ae8b2b2f7eeff8233218edd281338923106948).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199116238
  
    @cloud-fan Is this a typo maybe :)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56325150
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +963,28 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    +    // `FAILFAST` mode should throw an exception for corrupt records.
    +    val exception = intercept[SparkException] {
    +      sqlContext.read.option("mode", "FAILFAST").json(corruptRecords).collect()
    +    }
    +    assert(exception.getMessage.contains("Malformed line in FAILFAST mode: {"))
    +
    +    // `DROPMALFORMED` mode should skip corrupt records
    +    // For `PERMISSIVE` mode, it is tested in "Corrupt records" test.
    +    val jsonDF = sqlContext.read.option("mode", "DROPMALFORMED").json(corruptRecords)
    +    val schema = StructType(
    +      StructField("a", StringType, true) ::
    +        StructField("b", StringType, true) ::
    +        StructField("c", StringType, true) :: Nil)
    +    assert(schema === jsonDF.schema)
    +
    +    checkAnswer(
    +      jsonDF,
    +      Row("str_a_4", "str_b_4", "str_c_4") :: Nil
    +    )
    +  }
    --- End diff --
    
    Can we add more test cases with user specified schema?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197652640
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197667739
  
    **[Test build #53384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53384/consoleFull)** for PR 11756 at commit [`551593a`](https://github.com/apache/spark/commit/551593a96edacc731f4e76e1fc3c2ec9327220f2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197207753
  
    **[Test build #53286 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53286/consoleFull)** for PR 11756 at commit [`4c46f4b`](https://github.com/apache/spark/commit/4c46f4b97b062cba06d5b9bb7987b3606b0dd4dc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56323774
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala ---
    @@ -40,6 +40,7 @@ private[sql] object InferSchema {
           configOptions: JSONOptions): StructType = {
         require(configOptions.samplingRatio > 0,
           s"samplingRatio (${configOptions.samplingRatio}) should be greater than 0")
    +    val shouldHandleCorruptRecord = !configOptions.dropMalformed
    --- End diff --
    
    I think only `PERMISSIVE_MODE` need handle corrupted record? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56292393
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -991,6 +999,16 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
                 Row(null, null, null, "]") :: Nil
             )
     
    +        // Check if corrupt records are dropped.
    --- End diff --
    
    It shows the results below:
    ```
            sql(
              """
                |SELECT a, b, c, _unparsed
                |FROM jsonTableWithDropMalformed
              """.stripMargin).show()
    ```
    ```
    +---------+-------+-------+-------+
    |_unparsed|      a|      b|      c|
    +---------+-------+-------+-------+
    |     null|str_a_4|str_b_4|str_c_4|
    +---------+-------+-------+-------+
    ```
    
    However, I think as you said I might have to make them separate. Yea. It is a bit confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197622125
  
    **[Test build #53379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53379/consoleFull)** for PR 11756 at commit [`de8d291`](https://github.com/apache/spark/commit/de8d291c393fc5dd6182c2cf9e2675889c3cd796).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198236936
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198236937
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53507/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197246251
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53313/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197242349
  
    **[Test build #53313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53313/consoleFull)** for PR 11756 at commit [`32ae8b2`](https://github.com/apache/spark/commit/32ae8b2b2f7eeff8233218edd281338923106948).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197667952
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197645874
  
    Yea I think it should be an option on each read, not a global option. Most global options don't make a lot of sense as global options.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199158029
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53659/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199116245
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197634470
  
    **[Test build #53382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53382/consoleFull)** for PR 11756 at commit [`29a8f68`](https://github.com/apache/spark/commit/29a8f68cc5b3ba686f56ffc664766920eb3c2824).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56291651
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -969,6 +969,14 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           withTempTable("jsonTable") {
             val jsonDF = sqlContext.read.json(corruptRecords)
             jsonDF.registerTempTable("jsonTable")
    +        val jsonDFWithDropMalformed =
    +          sqlContext.read.option("mode", "DROPMALFORMED").json(corruptRecords)
    +        jsonDFWithDropMalformed.registerTempTable("jsonTableWithDropMalformed")
    --- End diff --
    
    Will we persistent JSON options along with data when register table? If not, I think this `jsonTableWithDropMalformed` is identical to `jsonTable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197276404
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197652379
  
    **[Test build #53379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53379/consoleFull)** for PR 11756 at commit [`de8d291`](https://github.com/apache/spark/commit/de8d291c393fc5dd6182c2cf9e2675889c3cd796).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197624257
  
    @cloud-fan Actually, I have a question.
    So, in JSON data source, I thought JSON data format itself can have a flexible schema so it does not necessarily have the same data unlike CSV data.
    
    So, I thought the range of "malformed" rows does not include some rows having different schema for JSON data source (whereas for CSV the range of "malformed" rows includes some rows having different schema). 
    
    For the differences, it lead to some different actions for each parse mode comparing to CSV data source.
    
    - **CSV**
      - `FAILFAST` : **It throws an exception if any row does not have a same schema** or if any row could not be converted into the user-given schema.
      - `DROPMALFORMED` :  : **It drops every row that does not have a same schema** or could not be converted into the user-given schema. 
    
    - **JSON** 
      - `FAILFAST` :  **It throws an exception if any row has a corrupted format** or if any row could not be converted into the user-given schema.
      - `DROPMALFORMED` : **It drops every row that has a corrupted format** or could not be converted into the user-given schema. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197629010
  
    LGTM, cc @davies  for another look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197276172
  
    **[Test build #53312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53312/consoleFull)** for PR 11756 at commit [`4440a55`](https://github.com/apache/spark/commit/4440a5556309bdfeec52f00ef151897bcdb04336).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56300778
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +963,31 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    +    withSQLConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD.key -> "_unparsed") {
    +      // `FAILFAST` mode should throw an exception for corrupt records.
    +      val exception = intercept[SparkException] {
    +        sqlContext.read.option("mode", "FAILFAST").json(corruptRecords).collect()
    +      }
    +      assert(exception.getMessage.contains("Malformed line in FAILFAST mode: {"))
    +
    +      // `DROPMALFORMED` mode should skip corrupt records
    +      // For `PERMISSIVE` mode, it is tested in "Corrupt records" test.
    +      val jsonDF = sqlContext.read.option("mode", "DROPMALFORMED").json(corruptRecords)
    +      val schema = StructType(
    +        StructField("_unparsed", StringType, true) ::
    +          StructField("a", StringType, true) ::
    +          StructField("b", StringType, true) ::
    +          StructField("c", StringType, true) :: Nil)
    +      assert(schema === jsonDF.schema)
    +
    +      checkAnswer(
    +        jsonDF,
    +        Row(null, "str_a_4", "str_b_4", "str_c_4") :: Nil
    --- End diff --
    
    I see. This adds `_corrupt_record` in schema inference. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199160796
  
    Thanks! Merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197645943
  
    But since we had it, i'd say we should keep it to avoid breaking compatibility. We can have the per-read option override the global option.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198236738
  
    **[Test build #53507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53507/consoleFull)** for PR 11756 at commit [`3ff900e`](https://github.com/apache/spark/commit/3ff900ec904991e79bf6267c16ee38dfc15660be).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197246246
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199136840
  
    **[Test build #53652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53652/consoleFull)** for PR 11756 at commit [`3ff900e`](https://github.com/apache/spark/commit/3ff900ec904991e79bf6267c16ee38dfc15660be).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56290625
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ParseModes.scala ---
    @@ -0,0 +1,41 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +private[datasources] object ParseModes {
    +  val PERMISSIVE_MODE = "PERMISSIVE"
    +  val DROP_MALFORMED_MODE = "DROPMALFORMED"
    +  val FAIL_FAST_MODE = "FAILFAST"
    +
    +  val DEFAULT = PERMISSIVE_MODE
    +
    +  def isValidMode(mode: String): Boolean = {
    +    mode.toUpperCase match {
    +      case PERMISSIVE_MODE | DROP_MALFORMED_MODE | FAIL_FAST_MODE => true
    +      case _ => false
    +    }
    +  }
    +
    +  def isDropMalformedMode(mode: String): Boolean = mode.toUpperCase == DROP_MALFORMED_MODE
    +  def isFailFastMode(mode: String): Boolean = mode.toUpperCase == FAIL_FAST_MODE
    +  def isPermissiveMode(mode: String): Boolean = if (isValidMode(mode))  {
    +    mode.toUpperCase == PERMISSIVE_MODE
    +  } else {
    +    true // We default to permissive is the mode string is not valid
    --- End diff --
    
    should we log a warning for this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56323218
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -162,6 +162,14 @@ def json(self, path, schema=None):
                     (e.g. 00012)
                 * ``allowBackslashEscapingAnyCharacter`` (default ``false``): allows accepting quoting \
                     of all character using backslash quoting mechanism
    +            *  ``mode`` (default ``PERMISSIVE``): allows a mode for dealing with corrupt records \
    +                during parsing.
    +                *  ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted \
    +                  record and puts the malformed string into a new field configured by \
    +                 ``spark.sql.columnNameOfCorruptRecord``. When a schema is set by user, it sets \
    +                 ``null`` for extra fields.
    +                *  ``DROPMALFORMED`` : ignores the whole corrupted records and append.
    --- End diff --
    
    should remove `and append`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198214357
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53504/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197644079
  
    This is a good point, if we only use this config while dealing with json data.  cc @rxin what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199090207
  
    retest it please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197238803
  
    The commit I submitted includes comment changes and avoiding to add a `_corrupt_record` field when it is `DROPMALFORMED` mode in type inference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56299745
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +963,31 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    +    withSQLConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD.key -> "_unparsed") {
    +      // `FAILFAST` mode should throw an exception for corrupt records.
    +      val exception = intercept[SparkException] {
    +        sqlContext.read.option("mode", "FAILFAST").json(corruptRecords).collect()
    +      }
    +      assert(exception.getMessage.contains("Malformed line in FAILFAST mode: {"))
    +
    +      // `DROPMALFORMED` mode should skip corrupt records
    +      // For `PERMISSIVE` mode, it is tested in "Corrupt records" test.
    +      val jsonDF = sqlContext.read.option("mode", "DROPMALFORMED").json(corruptRecords)
    +      val schema = StructType(
    +        StructField("_unparsed", StringType, true) ::
    +          StructField("a", StringType, true) ::
    +          StructField("b", StringType, true) ::
    +          StructField("c", StringType, true) :: Nil)
    +      assert(schema === jsonDF.schema)
    +
    +      checkAnswer(
    +        jsonDF,
    +        Row(null, "str_a_4", "str_b_4", "str_c_4") :: Nil
    --- End diff --
    
    Hm.. after looking into this again, it looks weird it has the `_corrupt_record` field. Let me look into this deeper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56299031
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +963,31 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    +    withSQLConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD.key -> "_unparsed") {
    +      // `FAILFAST` mode should throw an exception for corrupt records.
    +      val exception = intercept[SparkException] {
    +        sqlContext.read.option("mode", "FAILFAST").json(corruptRecords).collect()
    +      }
    +      assert(exception.getMessage.contains("Malformed line in FAILFAST mode: {"))
    +
    +      // `DROPMALFORMED` mode should skip corrupt records
    +      // For `PERMISSIVE` mode, it is tested in "Corrupt records" test.
    +      val jsonDF = sqlContext.read.option("mode", "DROPMALFORMED").json(corruptRecords)
    +      val schema = StructType(
    +        StructField("_unparsed", StringType, true) ::
    +          StructField("a", StringType, true) ::
    +          StructField("b", StringType, true) ::
    +          StructField("c", StringType, true) :: Nil)
    +      assert(schema === jsonDF.schema)
    +
    +      checkAnswer(
    +        jsonDF,
    +        Row(null, "str_a_4", "str_b_4", "str_c_4") :: Nil
    --- End diff --
    
    And looks like it's always null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197290603
  
    ah, thanks for the detail explanation and examples!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56784202
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +964,53 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    +    val schemaOne = StructType(
    +      StructField("a", StringType, true) ::
    +        StructField("b", StringType, true) ::
    +        StructField("c", StringType, true) :: Nil)
    +
    +    val schemaTwo = StructType(
    +        StructField("a", StringType, true) :: Nil)
    +
    +    // `FAILFAST` mode should throw an exception for corrupt records.
    +    val exceptionOne = intercept[SparkException] {
    +      sqlContext.read
    +        .option("mode", "FAILFAST")
    +        .json(corruptRecords)
    +        .collect()
    +    }
    +    assert(exceptionOne.getMessage.contains("Malformed line in FAILFAST mode: {"))
    +    val exceptionTwo = intercept[SparkException] {
    +      sqlContext.read
    +        .option("mode", "FAILFAST")
    +        .schema(schemaTwo)
    +        .json(corruptRecords)
    +        .collect()
    +    }
    +    assert(exceptionTwo.getMessage.contains("Malformed line in FAILFAST mode: {"))
    +
    +    // `DROPMALFORMED` mode should skip corrupt records
    +    // For `PERMISSIVE` mode, it is tested in "Corrupt records" test.
    +    val jsonDFOne = sqlContext.read
    +      .option("mode", "DROPMALFORMED")
    +      .json(corruptRecords)
    +    checkAnswer(
    +      jsonDFOne,
    +      Row("str_a_4", "str_b_4", "str_c_4") :: Nil
    +    )
    +    assert(jsonDFOne.schema === schemaOne)
    +
    +    val jsonDFTwo = sqlContext.read
    +      .option("mode", "DROPMALFORMED")
    +      .schema(schemaTwo)
    +      .json(corruptRecords)
    +    checkAnswer(
    +      jsonDFTwo,
    +      Row("str_a_4") :: Nil)
    +    assert(jsonDFTwo.schema === schemaTwo)
    +  }
    +
       test("Corrupt records") {
    --- End diff --
    
    we can change this to: "Corrupt records: PERMISSIVE mode"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198215631
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53506/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56291636
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -991,6 +999,16 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
                 Row(null, null, null, "]") :: Nil
             )
     
    +        // Check if corrupt records are dropped.
    --- End diff --
    
    No, it drops because it selects the table `jsonTableWithDropMalformed`. I did not add a test for `PERMISSIVE` mode because JSON data source has been `PERMISSIVE` mode itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199137105
  
    Last 2 comments, otherwise LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197658346
  
    Filed in https://issues.apache.org/jira/browse/SPARK-13953.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198217639
  
    **[Test build #53507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53507/consoleFull)** for PR 11756 at commit [`3ff900e`](https://github.com/apache/spark/commit/3ff900ec904991e79bf6267c16ee38dfc15660be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199158025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56298949
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +963,31 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    +    withSQLConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD.key -> "_unparsed") {
    +      // `FAILFAST` mode should throw an exception for corrupt records.
    +      val exception = intercept[SparkException] {
    +        sqlContext.read.option("mode", "FAILFAST").json(corruptRecords).collect()
    +      }
    +      assert(exception.getMessage.contains("Malformed line in FAILFAST mode: {"))
    +
    +      // `DROPMALFORMED` mode should skip corrupt records
    +      // For `PERMISSIVE` mode, it is tested in "Corrupt records" test.
    +      val jsonDF = sqlContext.read.option("mode", "DROPMALFORMED").json(corruptRecords)
    +      val schema = StructType(
    +        StructField("_unparsed", StringType, true) ::
    +          StructField("a", StringType, true) ::
    +          StructField("b", StringType, true) ::
    +          StructField("c", StringType, true) :: Nil)
    +      assert(schema === jsonDF.schema)
    +
    +      checkAnswer(
    +        jsonDF,
    +        Row(null, "str_a_4", "str_b_4", "str_c_4") :: Nil
    --- End diff --
    
    This confuses me, if we decide to ignore corrupted records, why we need an extra field for malformed string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197186617
  
    cc @cloud-fan for review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56298666
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -288,6 +288,9 @@ class DataFrameReader private[sql](sqlContext: SQLContext) extends Logging {
        * </li>
        * <li>`allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers
        * (e.g. 00012)</li>
    +   * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records
    +   * during parsing. When fails to parse, `PERMISSIVE` mode sets `null`, `DROPMALFORMED` drops the
    +   * record and `FAILFAST` throws an exception.<li>
    --- End diff --
    
    I think we need to say more about these 3 modes. From the tests, it looks to me that:
    
    * `PERMISSIVE` mode will set other fields to null when meet a corrupted record, and put the malformed string into a new field configured by `spark.sql.columnNameOfCorruptRecord`.
    * `DROPMALFORMED` mode will ignore corrupted records and append a new field which is always null to the output.
    * `FAILFAST` mode will throw an exception.
    
    It will be better if you can expand this doc and add some examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56306144
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -288,6 +288,9 @@ class DataFrameReader private[sql](sqlContext: SQLContext) extends Logging {
        * </li>
        * <li>`allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers
        * (e.g. 00012)</li>
    +   * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records
    +   * during parsing. When fails to parse, `PERMISSIVE` mode sets `null`, `DROPMALFORMED` drops the
    +   * record and `FAILFAST` throws an exception.<li>
    --- End diff --
    
    Could I maybe edit this without some examples? It is becoming a bit messy..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197235350
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53303/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56299307
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +963,31 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    +    withSQLConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD.key -> "_unparsed") {
    +      // `FAILFAST` mode should throw an exception for corrupt records.
    +      val exception = intercept[SparkException] {
    +        sqlContext.read.option("mode", "FAILFAST").json(corruptRecords).collect()
    +      }
    +      assert(exception.getMessage.contains("Malformed line in FAILFAST mode: {"))
    +
    +      // `DROPMALFORMED` mode should skip corrupt records
    +      // For `PERMISSIVE` mode, it is tested in "Corrupt records" test.
    +      val jsonDF = sqlContext.read.option("mode", "DROPMALFORMED").json(corruptRecords)
    +      val schema = StructType(
    +        StructField("_unparsed", StringType, true) ::
    +          StructField("a", StringType, true) ::
    +          StructField("b", StringType, true) ::
    +          StructField("c", StringType, true) :: Nil)
    +      assert(schema === jsonDF.schema)
    +
    +      checkAnswer(
    +        jsonDF,
    +        Row(null, "str_a_4", "str_b_4", "str_c_4") :: Nil
    --- End diff --
    
    That will keep corrupt records when it is `PERMISSIVE` mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-197665906
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-198214348
  
    **[Test build #53504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53504/consoleFull)** for PR 11756 at commit [`bfc0405`](https://github.com/apache/spark/commit/bfc04051cca8e88b5f4627560a4353b10b584416).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11756#discussion_r56296202
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -963,6 +963,31 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         )
       }
     
    +  test("SPARK-13764 Parse modes in JSON data source") {
    +    withSQLConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD.key -> "_unparsed") {
    --- End diff --
    
    @cloud-fan Would you look through this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11756#issuecomment-199139827
  
    **[Test build #53659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53659/consoleFull)** for PR 11756 at commit [`dec3d81`](https://github.com/apache/spark/commit/dec3d8167c862385a489b999feb7af6c03316cfa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org