You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2021/11/15 22:22:00 UTC

[jira] [Commented] (HIVE-25653) Incorrect results returned by STDDEV, STDDEV_SAMP, STDDEV_POP for floating point data types.

    [ https://issues.apache.org/jira/browse/HIVE-25653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444137#comment-17444137 ] 

Stamatis Zampetakis commented on HIVE-25653:
--------------------------------------------

Hi [~sankarh] , [~ashish-kumar-sharma] , I am not completely sure if this is a bug or it can be considered normal behavior.

Moreover, the summary and the scenario do not seem to be inline. The summary mentions floating point data types so I assume we talk about aproximate numeric types and the example contains a table of integers which is exact numeric.

The following scenario with approximate numerics in Postgres gives back a result which resembles the result we get in Hive.
{code:sql}
create table tbl_double ( col1 double precision);
insert into tbl_double values ('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72');
select STDDEV_SAMP(col1) AS STDDEV_6M , STDDEV(col1) as STDDEV ,STDDEV_POP(col1) as STDDEV_POP from tbl_double;
{code}

{noformat}
      stddev_6m       |        stddev        |      stddev_pop       
----------------------+----------------------+-----------------------
 5.42317860890711e-13 | 5.42317860890711e-13 | 5.020887760782747e-13
{noformat}

I think that trying the same scenario in different databases will give different results and it depends on various things. The result type of the UDFs which the SQL standard defines as implementation specific approximate numeric and the implicit conversion and rules holding in each database.

> Incorrect results returned by STDDEV, STDDEV_SAMP, STDDEV_POP for floating point data types.
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25653
>                 URL: https://issues.apache.org/jira/browse/HIVE-25653
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 3.1.0, 3.1.2
>            Reporter: Ashish Sharma
>            Assignee: Ashish Sharma
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Description
> *Script*- 
> create table test ( col1 int );
> insert into values ('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72');
> select STDDEV_SAMP(col1) AS STDDEV_6M , STDDEV(col1) as STDDEV ,STDDEV_POP(col1) as STDDEV_POP from test;
> *Result*- 
> STDDDEV_SAMP                            STDDEV                      STDDEV_POP 
> 5.940794514955821E-13     5.42317860890711E-13         5.42317860890711E-13
> *Expected*- 
> STDDDEV_SAMP                            STDDEV                      STDDEV_POP 
> 0                                                           0                                0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)