You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Miquel (JIRA)" <ji...@apache.org> on 2018/11/30 13:06:00 UTC
[jira] [Created] (SPARK-26233) Incorrect decimal value with java
beans and first/last/max... functions
Miquel created SPARK-26233:
------------------------------
Summary: Incorrect decimal value with java beans and first/last/max... functions
Key: SPARK-26233
URL: https://issues.apache.org/jira/browse/SPARK-26233
Project: Spark
Issue Type: Bug
Components: Java API
Affects Versions: 2.4.0, 2.3.1
Reporter: Miquel
Decimal values from Java beans are incorrectly scaled when used with functions like first/last/max...
This problem came because Encoders.bean always set Decimal values as _DecimalType(this.MAX_PRECISION(), 18)._
Usually it's not a problem if you use numeric functions like *sum* but for functions like *first*/*last*/*max*... it is a problem.
How to reproduce this error:
Using this class as an example:
{code:java}
public class Foo implements Serializable {
private String group;
private BigDecimal var;
public BigDecimal getVar() {
return var;
}
public void setVar(BigDecimal var) {
this.var = var;
}
public String getGroup() {
return group;
}
public void setGroup(String group) {
this.group = group;
}
}
{code}
And a dummy code to create some objects:
{code:java}
Dataset<Foo> ds = spark.range(5)
.map(l -> {
Foo foo = new Foo();
foo.setGroup("" + l);
foo.setVar(BigDecimal.valueOf(l + 0.1111));
return foo;
}, Encoders.bean(Foo.class));
ds.printSchema();
ds.show();
+-----+------+
|group| var|
+-----+------+
| 0|0.1111|
| 1|1.1111|
| 2|2.1111|
| 3|3.1111|
| 4|4.1111|
+-----+------+
{code}
We can see that the DecimalType is precision 38 and 18 scale and all values are show correctly.
But if we use a first function, they are scaled incorrectly:
{code:java}
ds.groupBy(col("group"))
.agg(
first("var")
)
.show();
+-----+-----------------+
|group|first(var, false)|
+-----+-----------------+
| 3| 3.1111E-14|
| 0| 1.111E-15|
| 1| 1.1111E-14|
| 4| 4.1111E-14|
| 2| 2.1111E-14|
+-----+-----------------+
{code}
This incorrect behavior cannot be reproduced if we use "numerical "functions like sum or if the column is cast a new Decimal Type.
{code:java}
ds.groupBy(col("group"))
.agg(
sum("var")
)
.show();
+-----+--------------------+
|group| sum(var)|
+-----+--------------------+
| 3|3.111100000000000000|
| 0|0.111100000000000000|
| 1|1.111100000000000000|
| 4|4.111100000000000000|
| 2|2.111100000000000000|
+-----+--------------------+
ds.groupBy(col("group"))
.agg(
first(col("var").cast(new DecimalType(38, 8)))
)
.show();
+-----+----------------------------------------+
|group|first(CAST(var AS DECIMAL(38,8)), false)|
+-----+----------------------------------------+
| 3| 3.11110000|
| 0| 0.11110000|
| 1| 1.11110000|
| 4| 4.11110000|
| 2| 2.11110000|
+-----+----------------------------------------+
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org