You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/08/02 12:10:42 UTC

[GitHub] spark pull request #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21969

    [SPARK-24945][SQL] Switching to uniVocity 2.7.3

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to upgrade uniVocity parser from **2.6.3** to **2.7.3**. The recent version includes a fix for the SPARK-24645 issue and has better performance.
    
    Before changes:
    ```
    Parsing quoted values:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    One quoted string                           33336 / 34122          0.0      666727.0       1.0X
    
    Wide rows with 1000 columns:             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Select 1000 columns                         90287 / 91713          0.0       90286.9       1.0X
    Select 100 columns                          31826 / 36589          0.0       31826.4       2.8X
    Select one column                           25738 / 25872          0.0       25737.9       3.5X
    count()                                       6931 / 7269          0.1        6931.5      13.0X
    ```
    after:
    ```
    Parsing quoted values:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    One quoted string                           33411 / 33510          0.0      668211.4       1.0X
    
    Wide rows with 1000 columns:             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Select 1000 columns                         88028 / 89311          0.0       88028.1       1.0X
    Select 100 columns                          29010 / 32755          0.0       29010.1       3.0X
    Select one column                           22936 / 22953          0.0       22936.5       3.8X
    count()                                       6657 / 6740          0.2        6656.6      13.5X
    ```
    Closes #21892 
    
    ## How was this patch tested?
    
    It was tested by `CSVSuite` and `CSVBenchmarks`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 univocity-2_7_3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21969
    
----
commit 7b569ae1318316129d4b0d46969b02324b18b0aa
Author: Maxim Gekk <ma...@...>
Date:   2018-07-27T11:59:39Z

    Bumping version of uniVocity parser up to 2.7.2

commit b116987d9a0adb887201177d41c1b94e6f5aeb63
Author: Maxim Gekk <ma...@...>
Date:   2018-07-27T13:25:11Z

    Call uniVocity even the set of selected columns is empty

commit 3fb9cf76df65abe14dd39d233d18242e72e0a729
Author: Maxim Gekk <ma...@...>
Date:   2018-08-02T09:14:27Z

    Bumping version to 2.7.3

commit a053994bcc6027668f64c9e55d09dfaa45cb97cf
Author: Maxim Gekk <ma...@...>
Date:   2018-08-02T09:14:48Z

    Revert "Call uniVocity even the set of selected columns is empty"
    
    This reverts commit b116987d9a0adb887201177d41c1b94e6f5aeb63.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    **[Test build #93997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93997/testReport)** for PR 21969 at commit [`a053994`](https://github.com/apache/spark/commit/a053994bcc6027668f64c9e55d09dfaa45cb97cf).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    **[Test build #93997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93997/testReport)** for PR 21969 at commit [`a053994`](https://github.com/apache/spark/commit/a053994bcc6027668f64c9e55d09dfaa45cb97cf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    **[Test build #94027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94027/testReport)** for PR 21969 at commit [`a053994`](https://github.com/apache/spark/commit/a053994bcc6027668f64c9e55d09dfaa45cb97cf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    **[Test build #94027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94027/testReport)** for PR 21969 at commit [`a053994`](https://github.com/apache/spark/commit/a053994bcc6027668f64c9e55d09dfaa45cb97cf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21969


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94027/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93997/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21969: [SPARK-24945][SQL] Switching to uniVocity 2.7.3

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21969
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org