You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2018/02/21 09:11:00 UTC

[GitHub] spark pull request #20648: [SPARK-23448][SQL] JSON parser should return part...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/20648

    [SPARK-23448][SQL] JSON parser should return partial row when part of columns are failed to parse under PermissiveMode

    ## What changes were proposed in this pull request?
    
    When we read JSON document with corrupted field under `PermissiveMode`:
    ```json
    {"attr1":"val1","attr2":"[\"val2\"]"}
    {"attr1":"val1","attr2":["val2"]}
    ```
    
    ```scala
    val schema = StructType(
      Seq(StructField("attr1", StringType, true),
          StructField("attr2", ArrayType(StringType, true), true)))
    
    spark.read.schema(schema).json(input).collect().foreach(println)
    ```
    
    We get this results currently:
    ```
    [null,null]
    [val1,WrappedArray(val2)]
    ```
    
    From `FailureSafeParser` and `BadRecordException`, seems there is the intention to return partial result for corrupted record. But the current implementation doesn't actually return partial result at all. As above example shows, all columns are null. This patch tries to fill the gap and returns partial result.
    
    ## How was this patch tested?
    
    Pass added tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-23448

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20648
    
----
commit 3d7d0415f2bfc2274fe94636b222d1ee437b0d24
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-02-20T14:03:49Z

    Returns partial row when part of columns are failed to parse.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    How about we start this by disallowing the partial results at all, documenting the behaviour and matching the behaviour to R's `read.csv(...)` in case of CSV (in terms of which case is an error or not)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20648: [SPARK-23448][SQL] JSON parser should return part...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/20648


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    @HyukjinKwon @cloud-fan Thanks for the comment! Yes, I agreed we need to keep the CSV's behavior. I will check how much we can clean up with it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/983/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    I'll close this PR and create another PR to refactor JSON parser and related codes. Thanks @cloud-fan and @HyukjinKwon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87586/testReport)** for PR 20648 at commit [`667dcd5`](https://github.com/apache/spark/commit/667dcd503f6d9ea47151846cf2824642d735b462).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/997/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87603/testReport)** for PR 20648 at commit [`667dcd5`](https://github.com/apache/spark/commit/667dcd503f6d9ea47151846cf2824642d735b462).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    `FileBasedDataSourceSuite` is still flaky.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87606/testReport)** for PR 20648 at commit [`667dcd5`](https://github.com/apache/spark/commit/667dcd503f6d9ea47151846cf2824642d735b462).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    +1 for disallowing it anyway if it was Wenchen's opinion too. Please go ahead. Will help double check anyway.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87603/testReport)** for PR 20648 at commit [`667dcd5`](https://github.com/apache/spark/commit/667dcd503f6d9ea47151846cf2824642d735b462).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    From the codes, looks like there is an intention to have partial results when failing to parse the documents. This patch makes the partial results. But this should be considered as behavior change, and we should discuss if this is acceptable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    I think at least we should update the document for this behavior of csv reader.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87581/testReport)** for PR 20648 at commit [`3d7d041`](https://github.com/apache/spark/commit/3d7d0415f2bfc2274fe94636b222d1ee437b0d24).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/987/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87600/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    @HyukjinKwon From the document of `DataFrameReader.csv`, the behavior of CSV reader isn't consistent with the document.
    
    ```
    `PERMISSIVE` : sets other fields to `null` when it meets a corrupted record, and puts
    the malformed string into a field configured by `columnNameOfCorruptRecord`.
    ```
    
    With respect to the document, I think we may need to disable it for CSV too. What do you think?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Yup, +1 for starting this by disallowing but up to my knowledge R's read.csv allows then the legnth of tokens are shorter then its schema, putting nulls (or NA) into missing fields, as a valid case.
    
    I was thinking of disallowing the partial results but allowing the tokens less than the schema as a valid case in CSV.
    
    I need to double check R's read.csv behaviour and the current behaviour but it was roughly my thought.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    I was just double checking the current status for both CSV and JSON:
    
    Seems CSV fills up the partial results with an exception (which is caught by permissive mode with the corrupt record text) when the length of schema is mismatched whereas JSON just allows this case without the exception (and also without the corrupt record text).
    
    and ... both CSV and JSON don't fill the partial results when the type of the field is mismatched but this change supports this case in JSON.
    
    cc @cloud-fan too. I remember we had a short talk about this partial results before. Do you think we should produce partial results in both CSV and JSON for mismatched types?
    
    Looks we are currently not doing this for both CSV and JSON when the types are mismatched, if I haven't missed something.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    >  Yup, +1 for starting this by disallowing but up to my knowledge R's read.csv allows then the legnth of tokens are shorter then its schema, putting nulls (or NA) into missing fields, as a valid case.
    
    @HyukjinKwon If the length of tokens are longer than its schema, R's read.csv seems not to have error. Is this behavior also we want?
    
    Spark's CSV reader just drops extra tokens when under permissive mode.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87606/testReport)** for PR 20648 at commit [`667dcd5`](https://github.com/apache/spark/commit/667dcd503f6d9ea47151846cf2824642d735b462).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87586/testReport)** for PR 20648 at commit [`667dcd5`](https://github.com/apache/spark/commit/667dcd503f6d9ea47151846cf2824642d735b462).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87581/testReport)** for PR 20648 at commit [`3d7d041`](https://github.com/apache/spark/commit/3d7d0415f2bfc2274fe94636b222d1ee437b0d24).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Will check this one within tomorrow .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87606/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Yup, it's unsupported in JSON but CSV supports it. Do you mean to disallow CSV too, or simply clean up JSON code path?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1001/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87603/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    I think we do have an intention to return partial result, but there is no strict definition for it, and seems there is no public document, so it's kind of a new feature.
    
    Since this is a non-trivial feature, the first question is: do we want this feature? There is no JIRA ticket requesting this feature, so I feel it is not urgent. We can refactor the code to make it more clearly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Yes, thanks @HyukjinKwon for checking the behavior. If we look at the codes of JSON parser, we will find many places indicating the expectation of availability of partial results.
    
    For example in `BadRecordException`, there is `partialResult` which is supposed to hold partial result of parsing a bad record. But we never really use it to return partial result but just use `None` for it.
    
    Note: If we don't want to return partial result at all, we should refactor this part of code to make it clear. If we decide not to change current behavior, I can submit another PR to do refactoring.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87586/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    cc @HyukjinKwon Can you check out if this behavior makes sense to you?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87600/testReport)** for PR 20648 at commit [`667dcd5`](https://github.com/apache/spark/commit/667dcd503f6d9ea47151846cf2824642d735b462).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/999/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    > allows the length of tokens are shorter than its schema, putting nulls (or NA) into missing fields
    
    Actually I also recalled this is a valid case for csv, and I remember that we did this intentionally. How much can we clean up if we want to keep this behavior in csv? If it's not a lot, maybe we don't need to bother to do refactoring.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87581/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    _To me_ I have been roughly thinking that we should better match it to R's read.csv and explicitly document this. I believe this is a good reference our CSV has resembled so far.
    
    BTW, I don't mind doing this separately as whatever you think is right.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    **[Test build #87600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87600/testReport)** for PR 20648 at commit [`667dcd5`](https://github.com/apache/spark/commit/667dcd503f6d9ea47151846cf2824642d735b462).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    According to offline discussion with @cloud-fan, partial results are not supported at all now. We should refactor the code to clear it and reduce confusion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org