You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/11/19 21:28:32 UTC

[GitHub] spark pull request #23091: [SPARK-26122][SQL] Support encoding for multiLine...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/23091

    [SPARK-26122][SQL] Support encoding for multiLine in CSV datasource

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to pass the CSV option `encoding`/`charset` to `uniVocity` parser to allow parsing CSV files in different encodings when `multiLine` is enabled. The value of the option is passed to the `beginParsing` method of `CSVParser`.
    
    ## How was this patch tested?
    
    Added new test to `CSVSuite` for different encodings and enabled/disabled header.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 csv-miltiline-encoding

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23091
    
----
commit 1a7a0cb4430f847ac95c0c764393003581415103
Author: Maxim Gekk <ma...@...>
Date:   2018-11-19T20:51:04Z

    Added a test

commit cd57ec5833bbfb5f0b33d63a56b48d25924f6be1
Author: Maxim Gekk <ma...@...>
Date:   2018-11-19T21:07:41Z

    Test multiple encodings

commit 1c76f8944979df8a7b9b8181ebfa38933c3f2c00
Author: Maxim Gekk <ma...@...>
Date:   2018-11-19T21:09:04Z

    Pass encoding to uniVocity parser

commit 16eb14c73f3fad8d83fee41d5665b52f180daf73
Author: Maxim Gekk <ma...@...>
Date:   2018-11-19T21:22:23Z

    Test with header and without it

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5152/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99021/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23091: [SPARK-26122][SQL] Support encoding for multiLine...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23091


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    @HyukjinKwon Please, take a look at the PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    **[Test build #99021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99021/testReport)** for PR 23091 at commit [`16eb14c`](https://github.com/apache/spark/commit/16eb14c73f3fad8d83fee41d5665b52f180daf73).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    **[Test build #99021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99021/testReport)** for PR 23091 at commit [`16eb14c`](https://github.com/apache/spark/commit/16eb14c73f3fad8d83fee41d5665b52f180daf73).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23091
  
    FYI hey @priancho IIRC, you proposed a similar change before in the mailing list. I wasn't positive about that because I was thinking we should deprecate `encoding` option at that time. It has a long long discussion and we're going to support this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org