You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/12/01 14:09:00 UTC

[jira] [Commented] (SPARK-26233) Incorrect decimal value with java beans and first/last/max... functions

    [ https://issues.apache.org/jira/browse/SPARK-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705835#comment-16705835 ] 

Hyukjin Kwon commented on SPARK-26233:
--------------------------------------

Hi [~mcanes], looks it's not specific to Java Beans. Mind if I ask to make a reproducer by normal scala collections? There are some decimal precision issues going on and minimised reproducer should be helpful to identify if it's a duplicate or not.

> Incorrect decimal value with java beans and first/last/max... functions
> -----------------------------------------------------------------------
>
>                 Key: SPARK-26233
>                 URL: https://issues.apache.org/jira/browse/SPARK-26233
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.3.1, 2.4.0
>            Reporter: Miquel
>            Priority: Minor
>
> Decimal values from Java beans are incorrectly scaled when used with functions like first/last/max...
> This problem came because Encoders.bean always set Decimal values as _DecimalType(this.MAX_PRECISION(), 18)._
> Usually it's not a problem if you use numeric functions like *sum* but for functions like *first*/*last*/*max*... it is a problem.
> How to reproduce this error:
> Using this class as an example:
> {code:java}
> public class Foo implements Serializable {
>   private String group;
>   private BigDecimal var;
>   public BigDecimal getVar() {
>     return var;
>   }
>   public void setVar(BigDecimal var) {
>     this.var = var;
>   }
>   public String getGroup() {
>     return group;
>   }
>   public void setGroup(String group) {
>     this.group = group;
>   }
> }
> {code}
>  
> And a dummy code to create some objects:
> {code:java}
> Dataset<Foo> ds = spark.range(5)
>     .map(l -> {
>       Foo foo = new Foo();
>       foo.setGroup("" + l);
>       foo.setVar(BigDecimal.valueOf(l + 0.1111));
>       return foo;
>     }, Encoders.bean(Foo.class));
> ds.printSchema();
> ds.show();
> +-----+------+
> |group| var|
> +-----+------+
> | 0|0.1111|
> | 1|1.1111|
> | 2|2.1111|
> | 3|3.1111|
> | 4|4.1111|
> +-----+------+
> {code}
> We can see that the DecimalType is precision 38 and 18 scale and all values are show correctly.
> But if we use a first function, they are scaled incorrectly:
> {code:java}
> ds.groupBy(col("group"))
>     .agg(
>         first("var")
>     )
>     .show();
> +-----+-----------------+
> |group|first(var, false)|
> +-----+-----------------+
> | 3| 3.1111E-14|
> | 0| 1.111E-15|
> | 1| 1.1111E-14|
> | 4| 4.1111E-14|
> | 2| 2.1111E-14|
> +-----+-----------------+
> {code}
> This incorrect behavior cannot be reproduced if we use "numerical "functions like sum or if the column is cast a new Decimal Type.
> {code:java}
> ds.groupBy(col("group"))
>     .agg(
>         sum("var")
>     )
>     .show();
> +-----+--------------------+
> |group| sum(var)|
> +-----+--------------------+
> | 3|3.111100000000000000|
> | 0|0.111100000000000000|
> | 1|1.111100000000000000|
> | 4|4.111100000000000000|
> | 2|2.111100000000000000|
> +-----+--------------------+
> ds.groupBy(col("group"))
>     .agg(
>         first(col("var").cast(new DecimalType(38, 8)))
>     )
>     .show();
> +-----+----------------------------------------+
> |group|first(CAST(var AS DECIMAL(38,8)), false)|
> +-----+----------------------------------------+
> | 3| 3.11110000|
> | 0| 0.11110000|
> | 1| 1.11110000|
> | 4| 4.11110000|
> | 2| 2.11110000|
> +-----+----------------------------------------+
> {code}
>    
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org