You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "koert kuipers (JIRA)" <ji...@apache.org> on 2019/04/21 23:04:00 UTC

[jira] [Comment Edited] (SPARK-27512) Decimal parsing leads to unexpected type inference

    [ https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822790#comment-16822790 ] 

koert kuipers edited comment on SPARK-27512 at 4/21/19 11:03 PM:
-----------------------------------------------------------------

[~maxgekk] maxim do you know why getDecimalParser has that if condition for Locale US where it calls
{code:java}
 (s: String) => new java.math.BigDecimal(s.replaceAll(",", "")) {code}
i think it's that
{code:java}
s.replaceAll(",", ""){code}
that is causing my issues.
 i saw it was introduced in:
{code:java}
commit 7a83d71403edf7d24fa5efc0ef913f3ce76d88b8
Author: Maxim Gekk <ma...@gmail.com>
Date:   Thu Nov 29 22:15:12 2018 +0800

    [SPARK-26163][SQL] Parsing decimals from JSON using locale
{code}


was (Author: koert):
[~maxgekk] max do you know why getDecimalParser has that if condition for Locale US where it calls
{code} (s: String) => new java.math.BigDecimal(s.replaceAll(",", "")) {code}

i think it's that {code}s.replaceAll(",", ""){code} that is causing my issues.
i  saw it was introduced in:
{code}
commit 7a83d71403edf7d24fa5efc0ef913f3ce76d88b8
Author: Maxim Gekk <ma...@gmail.com>
Date:   Thu Nov 29 22:15:12 2018 +0800

    [SPARK-26163][SQL] Parsing decimals from JSON using locale
{code}

> Decimal parsing leads to unexpected type inference
> --------------------------------------------------
>
>                 Key: SPARK-27512
>                 URL: https://issues.apache.org/jira/browse/SPARK-27512
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>         Environment: spark 3.0.0-SNAPSHOT from this commit:
> {code:bash}
> commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed
> Author: Dilip Biswal <db...@us.ibm.com>
> Date:   Mon Apr 15 21:26:45 2019 +0800
> {code}
>            Reporter: koert kuipers
>            Priority: Minor
>
> {code:bash}
> $ hadoop fs -text test.bsv
> x|y
> 1|1,2
> 2|2,3
> 3|3,4
> {code}
> in spark 2.4.1:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: string (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1|1,2|
> |  2|2,3|
> |  3|3,4|
> +---+---+
> {code}
> in spark 3.0.0-SNAPSHOT:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: decimal(2,0) (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1| 12|
> |  2| 23|
> |  3| 34|
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org