You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/11/19 21:28:32 UTC
[GitHub] spark pull request #23091: [SPARK-26122][SQL] Support encoding for multiLine...
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/23091
[SPARK-26122][SQL] Support encoding for multiLine in CSV datasource
## What changes were proposed in this pull request?
In the PR, I propose to pass the CSV option `encoding`/`charset` to `uniVocity` parser to allow parsing CSV files in different encodings when `multiLine` is enabled. The value of the option is passed to the `beginParsing` method of `CSVParser`.
## How was this patch tested?
Added new test to `CSVSuite` for different encodings and enabled/disabled header.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 csv-miltiline-encoding
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23091.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23091
----
commit 1a7a0cb4430f847ac95c0c764393003581415103
Author: Maxim Gekk <ma...@...>
Date: 2018-11-19T20:51:04Z
Added a test
commit cd57ec5833bbfb5f0b33d63a56b48d25924f6be1
Author: Maxim Gekk <ma...@...>
Date: 2018-11-19T21:07:41Z
Test multiple encodings
commit 1c76f8944979df8a7b9b8181ebfa38933c3f2c00
Author: Maxim Gekk <ma...@...>
Date: 2018-11-19T21:09:04Z
Pass encoding to uniVocity parser
commit 16eb14c73f3fad8d83fee41d5665b52f180daf73
Author: Maxim Gekk <ma...@...>
Date: 2018-11-19T21:22:23Z
Test with header and without it
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23091
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5152/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23091
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99021/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23091: [SPARK-26122][SQL] Support encoding for multiLine...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/23091
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23091
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23091
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23091
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/23091
@HyukjinKwon Please, take a look at the PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23091
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23091
**[Test build #99021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99021/testReport)** for PR 23091 at commit [`16eb14c`](https://github.com/apache/spark/commit/16eb14c73f3fad8d83fee41d5665b52f180daf73).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23091
**[Test build #99021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99021/testReport)** for PR 23091 at commit [`16eb14c`](https://github.com/apache/spark/commit/16eb14c73f3fad8d83fee41d5665b52f180daf73).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23091: [SPARK-26122][SQL] Support encoding for multiLine in CSV...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23091
FYI hey @priancho IIRC, you proposed a similar change before in the mailing list. I wasn't positive about that because I was thinking we should deprecate `encoding` option at that time. It has a long long discussion and we're going to support this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org