You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Miquel (JIRA)" <ji...@apache.org> on 2018/11/30 13:06:00 UTC

[jira] [Created] (SPARK-26233) Incorrect decimal value with java beans and first/last/max... functions

Miquel created SPARK-26233:
------------------------------

             Summary: Incorrect decimal value with java beans and first/last/max... functions
                 Key: SPARK-26233
                 URL: https://issues.apache.org/jira/browse/SPARK-26233
             Project: Spark
          Issue Type: Bug
          Components: Java API
    Affects Versions: 2.4.0, 2.3.1
            Reporter: Miquel


Decimal values from Java beans are incorrectly scaled when used with functions like first/last/max...

This problem came because Encoders.bean always set Decimal values as _DecimalType(this.MAX_PRECISION(), 18)._

Usually it's not a problem if you use numeric functions like *sum* but for functions like *first*/*last*/*max*... it is a problem.

How to reproduce this error:

Using this class as an example:
{code:java}
public class Foo implements Serializable {

  private String group;
  private BigDecimal var;

  public BigDecimal getVar() {
    return var;
  }

  public void setVar(BigDecimal var) {
    this.var = var;
  }

  public String getGroup() {
    return group;
  }

  public void setGroup(String group) {
    this.group = group;
  }
}
{code}
 

And a dummy code to create some objects:
{code:java}
Dataset<Foo> ds = spark.range(5)
    .map(l -> {
      Foo foo = new Foo();
      foo.setGroup("" + l);
      foo.setVar(BigDecimal.valueOf(l + 0.1111));
      return foo;
    }, Encoders.bean(Foo.class));
ds.printSchema();
ds.show();

+-----+------+
|group| var|
+-----+------+
| 0|0.1111|
| 1|1.1111|
| 2|2.1111|
| 3|3.1111|
| 4|4.1111|
+-----+------+
{code}
We can see that the DecimalType is precision 38 and 18 scale and all values are show correctly.

But if we use a first function, they are scaled incorrectly:
{code:java}
ds.groupBy(col("group"))
    .agg(
        first("var")
    )
    .show();


+-----+-----------------+
|group|first(var, false)|
+-----+-----------------+
| 3| 3.1111E-14|
| 0| 1.111E-15|
| 1| 1.1111E-14|
| 4| 4.1111E-14|
| 2| 2.1111E-14|
+-----+-----------------+
{code}
This incorrect behavior cannot be reproduced if we use "numerical "functions like sum or if the column is cast a new Decimal Type.
{code:java}
ds.groupBy(col("group"))
    .agg(
        sum("var")
    )
    .show();

+-----+--------------------+
|group| sum(var)|
+-----+--------------------+
| 3|3.111100000000000000|
| 0|0.111100000000000000|
| 1|1.111100000000000000|
| 4|4.111100000000000000|
| 2|2.111100000000000000|
+-----+--------------------+

ds.groupBy(col("group"))
    .agg(
        first(col("var").cast(new DecimalType(38, 8)))
    )
    .show();

+-----+----------------------------------------+
|group|first(CAST(var AS DECIMAL(38,8)), false)|
+-----+----------------------------------------+
| 3| 3.11110000|
| 0| 0.11110000|
| 1| 1.11110000|
| 4| 4.11110000|
| 2| 2.11110000|
+-----+----------------------------------------+
{code}
   

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org