You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/12/01 14:09:00 UTC
[jira] [Commented] (SPARK-26233) Incorrect decimal value with java
beans and first/last/max... functions
[ https://issues.apache.org/jira/browse/SPARK-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705835#comment-16705835 ]
Hyukjin Kwon commented on SPARK-26233:
--------------------------------------
Hi [~mcanes], looks it's not specific to Java Beans. Mind if I ask to make a reproducer by normal scala collections? There are some decimal precision issues going on and minimised reproducer should be helpful to identify if it's a duplicate or not.
> Incorrect decimal value with java beans and first/last/max... functions
> -----------------------------------------------------------------------
>
> Key: SPARK-26233
> URL: https://issues.apache.org/jira/browse/SPARK-26233
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 2.3.1, 2.4.0
> Reporter: Miquel
> Priority: Minor
>
> Decimal values from Java beans are incorrectly scaled when used with functions like first/last/max...
> This problem came because Encoders.bean always set Decimal values as _DecimalType(this.MAX_PRECISION(), 18)._
> Usually it's not a problem if you use numeric functions like *sum* but for functions like *first*/*last*/*max*... it is a problem.
> How to reproduce this error:
> Using this class as an example:
> {code:java}
> public class Foo implements Serializable {
> private String group;
> private BigDecimal var;
> public BigDecimal getVar() {
> return var;
> }
> public void setVar(BigDecimal var) {
> this.var = var;
> }
> public String getGroup() {
> return group;
> }
> public void setGroup(String group) {
> this.group = group;
> }
> }
> {code}
>
> And a dummy code to create some objects:
> {code:java}
> Dataset<Foo> ds = spark.range(5)
> .map(l -> {
> Foo foo = new Foo();
> foo.setGroup("" + l);
> foo.setVar(BigDecimal.valueOf(l + 0.1111));
> return foo;
> }, Encoders.bean(Foo.class));
> ds.printSchema();
> ds.show();
> +-----+------+
> |group| var|
> +-----+------+
> | 0|0.1111|
> | 1|1.1111|
> | 2|2.1111|
> | 3|3.1111|
> | 4|4.1111|
> +-----+------+
> {code}
> We can see that the DecimalType is precision 38 and 18 scale and all values are show correctly.
> But if we use a first function, they are scaled incorrectly:
> {code:java}
> ds.groupBy(col("group"))
> .agg(
> first("var")
> )
> .show();
> +-----+-----------------+
> |group|first(var, false)|
> +-----+-----------------+
> | 3| 3.1111E-14|
> | 0| 1.111E-15|
> | 1| 1.1111E-14|
> | 4| 4.1111E-14|
> | 2| 2.1111E-14|
> +-----+-----------------+
> {code}
> This incorrect behavior cannot be reproduced if we use "numerical "functions like sum or if the column is cast a new Decimal Type.
> {code:java}
> ds.groupBy(col("group"))
> .agg(
> sum("var")
> )
> .show();
> +-----+--------------------+
> |group| sum(var)|
> +-----+--------------------+
> | 3|3.111100000000000000|
> | 0|0.111100000000000000|
> | 1|1.111100000000000000|
> | 4|4.111100000000000000|
> | 2|2.111100000000000000|
> +-----+--------------------+
> ds.groupBy(col("group"))
> .agg(
> first(col("var").cast(new DecimalType(38, 8)))
> )
> .show();
> +-----+----------------------------------------+
> |group|first(CAST(var AS DECIMAL(38,8)), false)|
> +-----+----------------------------------------+
> | 3| 3.11110000|
> | 0| 0.11110000|
> | 1| 1.11110000|
> | 4| 4.11110000|
> | 2| 2.11110000|
> +-----+----------------------------------------+
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org