You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/11/08 12:46:09 UTC

[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22979

    [SPARK-25977][SQL] Parsing decimals from CSV using locale

    ## What changes were proposed in this pull request?
    
    In the PR, I propose using of the locale option to parse decimals from CSV input. After the changes, `UnivocityParser` converts input string to `BigDecimal` and to Spark's Decimal by using `java.text.DecimalFormat`.
    
    ## How was this patch tested?
    
    Added a test for the `en-US`, `ko-KR`, `ru-RU`, `de-DE` locales.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 decimal-parsing-locale

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22979.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22979
    
----
commit c567dccc487d9f495eb00ca412d5b48d7899f401
Author: Maxim Gekk <ma...@...>
Date:   2018-11-08T12:31:06Z

    Add a test

commit 2b41eba3c666efc2b6860fd31e387839afe378d5
Author: Maxim Gekk <ma...@...>
Date:   2018-11-08T12:31:24Z

    Fix decimal parsing

commit cf438ae2071f952eb72aab6dddf47c2aeb930d84
Author: Maxim Gekk <ma...@...>
Date:   2018-11-08T12:31:35Z

    Add locale option

commit f9438c4b94d8a1dcf8b51d087cabae5a3c420db7
Author: Maxim Gekk <ma...@...>
Date:   2018-11-08T12:36:57Z

    Updating the migration guide

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r233242705
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -104,6 +106,14 @@ class UnivocityParser(
         requiredSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray
       }
     
    +  private val decimalParser = if (SQLConf.get.legacyDecimalParsing) {
    +    (s: String) => new BigDecimal(s.replaceAll(",", ""))
    --- End diff --
    
    I made some changes but a few tests in `CSVSuite` begun failing because Decimal type is inferred from a timestamps. For example, `2015` from `2015-08-20 15:57:00.0`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232487559
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -149,8 +156,8 @@ class UnivocityParser(
     
         case dt: DecimalType => (d: String) =>
           nullSafeDatum(d, name, nullable, options) { datum =>
    -        val value = new BigDecimal(datum.replaceAll(",", ""))
    -        Decimal(value, dt.precision, dt.scale)
    +        val bigDecimal = decimalParser.parse(datum).asInstanceOf[BigDecimal]
    --- End diff --
    
    > is it safe that we assume this Number is BigDecimal?
    
    I am not absolutely sure that it always return `BigDecimal`. Found this at https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html#parse(java.lang.String,java.text.ParsePosition) :
    ```
    If isParseBigDecimal() is true, values are returned as BigDecimal objects. The values are the ones constructed by BigDecimal.BigDecimal(String) for corresponding strings in locale-independent format. The special cases negative and positive infinity and NaN are returned as Double instances holding the values of the corresponding Double constants.
    ```
    So, `isParseBigDecimal()` returns `true` when `setParseBigDecimal` was called with `true` as in the PR.
    
    > Looks there are some possibilities that it can return other types.
    
    In that case we just fail with a cast exception and the record will be handled as a bad record. or you want to see more clear message in the exception?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    @HyukjinKwon Could you take a look at this one more time, please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98600/testReport)** for PR 22979 at commit [`3125c23`](https://github.com/apache/spark/commit/3125c234182cf38b503f99e8d8095c2055186c20).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99403/testReport)** for PR 22979 at commit [`e989b77`](https://github.com/apache/spark/commit/e989b77df5aaced9ffa8212ca2adc1b0d60cfe96).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99142/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99323/testReport)** for PR 22979 at commit [`0d1a4f0`](https://github.com/apache/spark/commit/0d1a4f0c8a762ceab757da1c5dd89f7922eb863d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Looks good. Will take a closer look.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98816/testReport)** for PR 22979 at commit [`14b5109`](https://github.com/apache/spark/commit/14b5109bfba9111a9d4ecb463713a1526b1e3fb1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98737/testReport)** for PR 22979 at commit [`c28b79f`](https://github.com/apache/spark/commit/c28b79f3b4c485968df61281079114c1d6756348).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99185/testReport)** for PR 22979 at commit [`2918d04`](https://github.com/apache/spark/commit/2918d04b774ed4135eab6301584261ff124335d0).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232486670
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -149,8 +156,8 @@ class UnivocityParser(
     
         case dt: DecimalType => (d: String) =>
           nullSafeDatum(d, name, nullable, options) { datum =>
    -        val value = new BigDecimal(datum.replaceAll(",", ""))
    -        Decimal(value, dt.precision, dt.scale)
    +        val bigDecimal = decimalParser.parse(datum).asInstanceOf[BigDecimal]
    --- End diff --
    
    @MaxGekk, is it safe that we assume this `Number` is `BigDecimal`? Looks there are some possibilities that it can return other types.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99166/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98702/testReport)** for PR 22979 at commit [`3dfce18`](https://github.com/apache/spark/commit/3dfce18280bad432a9faaa546d33e9d693a56f1e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98600/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99169/testReport)** for PR 22979 at commit [`2918d04`](https://github.com/apache/spark/commit/2918d04b774ed4135eab6301584261ff124335d0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232520110
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -149,8 +156,8 @@ class UnivocityParser(
     
         case dt: DecimalType => (d: String) =>
           nullSafeDatum(d, name, nullable, options) { datum =>
    -        val value = new BigDecimal(datum.replaceAll(",", ""))
    -        Decimal(value, dt.precision, dt.scale)
    +        val bigDecimal = decimalParser.parse(datum).asInstanceOf[BigDecimal]
    --- End diff --
    
    Sounds good if that's not difficult.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98640/testReport)** for PR 22979 at commit [`64a97a2`](https://github.com/apache/spark/commit/64a97a27e4b22e9999605f3b2ddfebb7eaebdebc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98651/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232487778
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -149,8 +156,8 @@ class UnivocityParser(
     
         case dt: DecimalType => (d: String) =>
           nullSafeDatum(d, name, nullable, options) { datum =>
    -        val value = new BigDecimal(datum.replaceAll(",", ""))
    -        Decimal(value, dt.precision, dt.scale)
    +        val bigDecimal = decimalParser.parse(datum).asInstanceOf[BigDecimal]
    --- End diff --
    
    Ah, right. The previous codes will anyway throw an exception, I see. One thing I am a little bit unsure is how much different the behaviour is. For instance, looks the previous one handles sign character as well (`+` and `-`).
    
    Let me take a closer look. I think I need to.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99416/testReport)** for PR 22979 at commit [`e989b77`](https://github.com/apache/spark/commit/e989b77df5aaced9ffa8212ca2adc1b0d60cfe96).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99403/testReport)** for PR 22979 at commit [`e989b77`](https://github.com/apache/spark/commit/e989b77df5aaced9ffa8212ca2adc1b0d60cfe96).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98600/testReport)** for PR 22979 at commit [`3125c23`](https://github.com/apache/spark/commit/3125c234182cf38b503f99e8d8095c2055186c20).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99212/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232487851
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -149,8 +156,8 @@ class UnivocityParser(
     
         case dt: DecimalType => (d: String) =>
           nullSafeDatum(d, name, nullable, options) { datum =>
    -        val value = new BigDecimal(datum.replaceAll(",", ""))
    -        Decimal(value, dt.precision, dt.scale)
    +        val bigDecimal = decimalParser.parse(datum).asInstanceOf[BigDecimal]
    --- End diff --
    
    For instance, there was a similar try to change the date parsing library. I already know the different is quite breaking - so I suggested to add a configuration or fallback for now. Probably we should similarily just document the behaviour change in the migration guide but actually less sure yet even about this. anyway will take another look shortly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99169 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99169/testReport)** for PR 22979 at commit [`2918d04`](https://github.com/apache/spark/commit/2918d04b774ed4135eab6301584261ff124335d0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98737/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99166 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99166/testReport)** for PR 22979 at commit [`2918d04`](https://github.com/apache/spark/commit/2918d04b774ed4135eab6301584261ff124335d0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99195/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98702/testReport)** for PR 22979 at commit [`3dfce18`](https://github.com/apache/spark/commit/3dfce18280bad432a9faaa546d33e9d693a56f1e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99416/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98738/testReport)** for PR 22979 at commit [`1723da2`](https://github.com/apache/spark/commit/1723da2293e098279d8070fa270dd3785c45af1c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98651/testReport)** for PR 22979 at commit [`64a97a2`](https://github.com/apache/spark/commit/64a97a27e4b22e9999605f3b2ddfebb7eaebdebc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99142/testReport)** for PR 22979 at commit [`db37a71`](https://github.com/apache/spark/commit/db37a71b7421eeee6befd00e700fa2cfd2b27785).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99403/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99323 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99323/testReport)** for PR 22979 at commit [`0d1a4f0`](https://github.com/apache/spark/commit/0d1a4f0c8a762ceab757da1c5dd89f7922eb863d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99169/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    The PR requires this PR #22951 to support the `locale` option properly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r235618859
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala ---
    @@ -79,4 +83,22 @@ object CSVExprUtils {
             throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
         }
       }
    +
    +  def getDecimalParser(useLegacyParser: Boolean, locale: Locale): String => java.math.BigDecimal = {
    +    if (useLegacyParser) {
    +      (s: String) => new BigDecimal(s.replaceAll(",", ""))
    +    } else {
    +      val df = new DecimalFormat("", new DecimalFormatSymbols(locale))
    --- End diff --
    
    renamed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98599/testReport)** for PR 22979 at commit [`f9438c4`](https://github.com/apache/spark/commit/f9438c4b94d8a1dcf8b51d087cabae5a3c420db7).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    @HyukjinKwon Could it be related to recent changes in python tests?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98738/testReport)** for PR 22979 at commit [`1723da2`](https://github.com/apache/spark/commit/1723da2293e098279d8070fa270dd3785c45af1c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98702/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98816/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98735/testReport)** for PR 22979 at commit [`8c5593e`](https://github.com/apache/spark/commit/8c5593ef13c3f4da5d2c746e906e8a33e46ee2a8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    No chance to pass tests in the PR ;-)
    ```
    test_aggregator (pyspark.sql.tests.test_group.GroupTests) ... #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x0000000000000080, pid=40070, tid=139648880690944
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C  0x0000000000000080
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232486599
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -104,6 +105,12 @@ class UnivocityParser(
         requiredSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray
       }
     
    +  private val decimalParser = {
    +    val df = new DecimalFormat("", new DecimalFormatSymbols(options.locale))
    --- End diff --
    
    not a big deal but I would just name it `decimalFormat`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99211/testReport)** for PR 22979 at commit [`15a09b8`](https://github.com/apache/spark/commit/15a09b8f5a8181e2e758f108a2734c22af4928de).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99456 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99456/testReport)** for PR 22979 at commit [`521bd45`](https://github.com/apache/spark/commit/521bd45c95f56d5bff3656d0af018f963c524605).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98816/testReport)** for PR 22979 at commit [`14b5109`](https://github.com/apache/spark/commit/14b5109bfba9111a9d4ecb463713a1526b1e3fb1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CSVInferSchema(options: CSVOptions) extends Serializable `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99456/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98735/testReport)** for PR 22979 at commit [`8c5593e`](https://github.com/apache/spark/commit/8c5593ef13c3f4da5d2c746e906e8a33e46ee2a8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99195/testReport)** for PR 22979 at commit [`2918d04`](https://github.com/apache/spark/commit/2918d04b774ed4135eab6301584261ff124335d0).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r233353589
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -104,6 +106,14 @@ class UnivocityParser(
         requiredSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray
       }
     
    +  private val decimalParser = if (SQLConf.get.legacyDecimalParsing) {
    +    (s: String) => new BigDecimal(s.replaceAll(",", ""))
    --- End diff --
    
    I made parsing more strict. Should be fine now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99211/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99323/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99211/testReport)** for PR 22979 at commit [`15a09b8`](https://github.com/apache/spark/commit/15a09b8f5a8181e2e758f108a2734c22af4928de).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CSVInferSchema(options: CSVOptions) extends Serializable `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98640/testReport)** for PR 22979 at commit [`64a97a2`](https://github.com/apache/spark/commit/64a97a27e4b22e9999605f3b2ddfebb7eaebdebc).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98737/testReport)** for PR 22979 at commit [`c28b79f`](https://github.com/apache/spark/commit/c28b79f3b4c485968df61281079114c1d6756348).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98651/testReport)** for PR 22979 at commit [`64a97a2`](https://github.com/apache/spark/commit/64a97a27e4b22e9999605f3b2ddfebb7eaebdebc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99142/testReport)** for PR 22979 at commit [`db37a71`](https://github.com/apache/spark/commit/db37a71b7421eeee6befd00e700fa2cfd2b27785).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99416/testReport)** for PR 22979 at commit [`e989b77`](https://github.com/apache/spark/commit/e989b77df5aaced9ffa8212ca2adc1b0d60cfe96).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99317/testReport)** for PR 22979 at commit [`0d1a4f0`](https://github.com/apache/spark/commit/0d1a4f0c8a762ceab757da1c5dd89f7922eb863d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232496879
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -149,8 +156,8 @@ class UnivocityParser(
     
         case dt: DecimalType => (d: String) =>
           nullSafeDatum(d, name, nullable, options) { datum =>
    -        val value = new BigDecimal(datum.replaceAll(",", ""))
    -        Decimal(value, dt.precision, dt.scale)
    +        val bigDecimal = decimalParser.parse(datum).asInstanceOf[BigDecimal]
    --- End diff --
    
    > so I suggested to add a configuration or fallback for now ...
    
    What about SQL config `spark.sql.legacy.decimalParsing.enabled` with default value `false`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99185/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22979


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #98599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98599/testReport)** for PR 22979 at commit [`f9438c4`](https://github.com/apache/spark/commit/f9438c4b94d8a1dcf8b51d087cabae5a3c420db7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99212/testReport)** for PR 22979 at commit [`5236336`](https://github.com/apache/spark/commit/5236336cefce6798397cc9f955664435f6dc3f28).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99317/testReport)** for PR 22979 at commit [`0d1a4f0`](https://github.com/apache/spark/commit/0d1a4f0c8a762ceab757da1c5dd89f7922eb863d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    @HyukjinKwon May I ask you to review the latest changes here, please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98640/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Also let me leave a cc for @srowen.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232486778
  
    --- Diff: docs/sql-migration-guide-upgrade.md ---
    @@ -9,6 +9,8 @@ displayTitle: Spark SQL Upgrading Guide
     
     ## Upgrading From Spark SQL 2.4 to 3.0
     
    +  - Since Spark 3.0, to parse decimals in locale specific format from CSV, set the `locale` option to proper value.
    --- End diff --
    
    While we are here, let's format the migration guide item format as others. like ..
    
    In Spark version 2.4 and earlier, it only deals with locale specific decimal notation like `,`. Since Spark 3.0, the locale can be set by `locale` and default locale is ....
    
    Please feel free to change as you think is righter


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232484798
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala ---
    @@ -226,4 +227,17 @@ class CsvExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper with P
             InternalRow(17836)) // number of days from 1970-01-01
         }
       }
    +
    +  test("parse decimals using locale") {
    +    Seq("en-US", "ko-KR", "ru-RU", "de-DE").foreach { langTag =>
    +      val schema = new StructType().add("d", DecimalType(10, 5))
    +      val options = Map("locale" -> langTag, "sep" -> "|")
    +      val expected = Decimal(1000.001, 10, 5)
    +      val df = new DecimalFormat("", new DecimalFormatSymbols(Locale.forLanguageTag(langTag)))
    +      val input = df.format(expected.toBigDecimal)
    +      checkEvaluation(
    +        CsvToStructs(schema, options, Literal.create(input), gmtId),
    +        InternalRow(expected))
    +    }
    +  }
    --- End diff --
    
    @MaxGekk, there's `UnivocityParserSuite`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99456/testReport)** for PR 22979 at commit [`521bd45`](https://github.com/apache/spark/commit/521bd45c95f56d5bff3656d0af018f963c524605).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98599/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99185/testReport)** for PR 22979 at commit [`2918d04`](https://github.com/apache/spark/commit/2918d04b774ed4135eab6301584261ff124335d0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r235588064
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala ---
    @@ -79,4 +83,22 @@ object CSVExprUtils {
             throw new IllegalArgumentException(s"Delimiter cannot be more than one character: $str")
         }
       }
    +
    +  def getDecimalParser(useLegacyParser: Boolean, locale: Locale): String => java.math.BigDecimal = {
    +    if (useLegacyParser) {
    +      (s: String) => new BigDecimal(s.replaceAll(",", ""))
    +    } else {
    +      val df = new DecimalFormat("", new DecimalFormatSymbols(locale))
    --- End diff --
    
    let's name `df` `decimalFormat` ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99212/testReport)** for PR 22979 at commit [`5236336`](https://github.com/apache/spark/commit/5236336cefce6798397cc9f955664435f6dc3f28).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r232484751
  
    --- Diff: docs/sql-migration-guide-upgrade.md ---
    @@ -9,6 +9,8 @@ displayTitle: Spark SQL Upgrading Guide
     
     ## Upgrading From Spark SQL 2.4 to 3.0
     
    +  - Since Spark 3.0, to parse decimals in locale specific format from CSV, set the `locale` option to proper value.
    --- End diff --
    
    @MaxGekk, it's not a behaviour change I guess.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98735/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22979#discussion_r233009506
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ---
    @@ -104,6 +106,14 @@ class UnivocityParser(
         requiredSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray
       }
     
    +  private val decimalParser = if (SQLConf.get.legacyDecimalParsing) {
    +    (s: String) => new BigDecimal(s.replaceAll(",", ""))
    --- End diff --
    
    @MaxGekk, looks we should do the same thing in schema inference path as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99166 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99166/testReport)** for PR 22979 at commit [`2918d04`](https://github.com/apache/spark/commit/2918d04b774ed4135eab6301584261ff124335d0).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    **[Test build #99195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99195/testReport)** for PR 22979 at commit [`2918d04`](https://github.com/apache/spark/commit/2918d04b774ed4135eab6301584261ff124335d0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98738/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22979
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99317/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org