You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/08/02 12:10:42 UTC
[GitHub] spark pull request #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21969
[SPARK-24945][SQL] Switching to uniVocity 2.7.3
## What changes were proposed in this pull request?
In the PR, I propose to upgrade uniVocity parser from **2.6.3** to **2.7.3**. The recent version includes a fix for the SPARK-24645 issue and has better performance.
Before changes:
```
Parsing quoted values: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
One quoted string 33336 / 34122 0.0 666727.0 1.0X
Wide rows with 1000 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Select 1000 columns 90287 / 91713 0.0 90286.9 1.0X
Select 100 columns 31826 / 36589 0.0 31826.4 2.8X
Select one column 25738 / 25872 0.0 25737.9 3.5X
count() 6931 / 7269 0.1 6931.5 13.0X
```
after:
```
Parsing quoted values: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
One quoted string 33411 / 33510 0.0 668211.4 1.0X
Wide rows with 1000 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Select 1000 columns 88028 / 89311 0.0 88028.1 1.0X
Select 100 columns 29010 / 32755 0.0 29010.1 3.0X
Select one column 22936 / 22953 0.0 22936.5 3.8X
count() 6657 / 6740 0.2 6656.6 13.5X
```
Closes #21892
## How was this patch tested?
It was tested by `CSVSuite` and `CSVBenchmarks`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 univocity-2_7_3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21969.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21969
----
commit 7b569ae1318316129d4b0d46969b02324b18b0aa
Author: Maxim Gekk <ma...@...>
Date: 2018-07-27T11:59:39Z
Bumping version of uniVocity parser up to 2.7.2
commit b116987d9a0adb887201177d41c1b94e6f5aeb63
Author: Maxim Gekk <ma...@...>
Date: 2018-07-27T13:25:11Z
Call uniVocity even the set of selected columns is empty
commit 3fb9cf76df65abe14dd39d233d18242e72e0a729
Author: Maxim Gekk <ma...@...>
Date: 2018-08-02T09:14:27Z
Bumping version to 2.7.3
commit a053994bcc6027668f64c9e55d09dfaa45cb97cf
Author: Maxim Gekk <ma...@...>
Date: 2018-08-02T09:14:48Z
Revert "Call uniVocity even the set of selected columns is empty"
This reverts commit b116987d9a0adb887201177d41c1b94e6f5aeb63.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21969
**[Test build #93997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93997/testReport)** for PR 21969 at commit [`a053994`](https://github.com/apache/spark/commit/a053994bcc6027668f64c9e55d09dfaa45cb97cf).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21969
**[Test build #93997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93997/testReport)** for PR 21969 at commit [`a053994`](https://github.com/apache/spark/commit/a053994bcc6027668f64c9e55d09dfaa45cb97cf).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/21969
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21969
**[Test build #94027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94027/testReport)** for PR 21969 at commit [`a053994`](https://github.com/apache/spark/commit/a053994bcc6027668f64c9e55d09dfaa45cb97cf).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21969
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21969
**[Test build #94027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94027/testReport)** for PR 21969 at commit [`a053994`](https://github.com/apache/spark/commit/a053994bcc6027668f64c9e55d09dfaa45cb97cf).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/21969
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21969
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21969
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21969
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21969
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21969
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21969
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94027/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21969
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93997/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21969
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org