You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Daniel Darabos (JIRA)" <ji...@apache.org> on 2018/08/17 13:09:00 UTC

[jira] [Updated] (SPARK-25146) avg() returns null on some decimals

     [ https://issues.apache.org/jira/browse/SPARK-25146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Darabos updated SPARK-25146:
-----------------------------------
    Description: 
We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average them. The average in some cases comes out to {{null}} to our surprise (and disappointment).

After a bit of digging it looks like these numbers have ended up with the {{decimal(37,30)}} type. I've got a Spark Shell (2.3.0 and 2.3.1) repro with this type:

{code}
scala> (1 to 10000).map(_*0.001).toDF.createOrReplaceTempView("x")

scala> spark.sql("select cast(value as decimal(37, 30)) as v from x").createOrReplaceTempView("x")

scala> spark.sql("select avg(v) from x").show
+------+
|avg(v)|
+------+
|  null|
+------+
{code}

For up to 4471 numbers it is able to calculate the average. For 4472 or more numbers it's {{null}}.

Now I'll just change these numbers to {{double}}. But we got the types entirely automatically. We never asked for {{decimal}}. If this is the default type, it's important to support averaging a handful of them. (Sorry for the bitterness. I like {{double}} more. :))

Curiously, {{sum()}} works. And {{count()}} too. So it's quite the surprise that {{avg()}} fails.

  was:
We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average them. The average in some cases comes out to {{null}} to our surprise (and disappointment).

After a bit of digging it looks like these numbers have ended up with the {{decimal(37,30)}} type. I've got a Spark Shell (2.3.0 and 2.3.1) repro with this type:

{{scala> (1 to 10000).map(_*0.001).toDF.createOrReplaceTempView("x")}}

{{scala> spark.sql("select cast(value as decimal(37, 30)) as v from x").createOrReplaceTempView("x")}}

{{scala> spark.sql("select avg(v) from x").show}}

{{+------+}}
{{|avg(v)|}}
{{+------+}}
{{|  null|}}
{{+------+}}

For up to 4471 numbers it is able to calculate the average. For 4472 or more numbers it's {{null}}.

Now I'll just change these numbers to {{double}}. But we got the types entirely automatically. We never asked for {{decimal}}. If this is the default type, it's important to support averaging a handful of them. (Sorry for the bitterness. I like {{double}} more. :))

Curiously, {{sum()}} works. And {{count()}} too. So it's quite the surprise that {{avg()}} fails.


> avg() returns null on some decimals
> -----------------------------------
>
>                 Key: SPARK-25146
>                 URL: https://issues.apache.org/jira/browse/SPARK-25146
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1
>            Reporter: Daniel Darabos
>            Priority: Major
>
> We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average them. The average in some cases comes out to {{null}} to our surprise (and disappointment).
> After a bit of digging it looks like these numbers have ended up with the {{decimal(37,30)}} type. I've got a Spark Shell (2.3.0 and 2.3.1) repro with this type:
> {code}
> scala> (1 to 10000).map(_*0.001).toDF.createOrReplaceTempView("x")
> scala> spark.sql("select cast(value as decimal(37, 30)) as v from x").createOrReplaceTempView("x")
> scala> spark.sql("select avg(v) from x").show
> +------+
> |avg(v)|
> +------+
> |  null|
> +------+
> {code}
> For up to 4471 numbers it is able to calculate the average. For 4472 or more numbers it's {{null}}.
> Now I'll just change these numbers to {{double}}. But we got the types entirely automatically. We never asked for {{decimal}}. If this is the default type, it's important to support averaging a handful of them. (Sorry for the bitterness. I like {{double}} more. :))
> Curiously, {{sum()}} works. And {{count()}} too. So it's quite the surprise that {{avg()}} fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org