You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Mich Talebzadeh (JIRA)" <ji...@apache.org> on 2016/04/21 11:04:25 UTC
[jira] [Updated] (HIVE-13574) Standard Deviation in Hive 2 is still
incorrect
[ https://issues.apache.org/jira/browse/HIVE-13574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mich Talebzadeh updated HIVE-13574:
-----------------------------------
Component/s: (was: Build Infrastructure)
> Standard Deviation in Hive 2 is still incorrect
> -----------------------------------------------
>
> Key: HIVE-13574
> URL: https://issues.apache.org/jira/browse/HIVE-13574
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.0.0
> Environment: All
> Reporter: Mich Talebzadeh
> Assignee: Mich Talebzadeh
> Priority: Minor
> Labels: patch
> Fix For: 0.10.1
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> I reported back in April 2015 that what Hive calls Standard Deviation Function STDDEV is a pointer to STDDEV_POP. This is incorrect and has not been rectified in Hive 2
> Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also I did tests with Spark 1.6 as well and Spark correctly points STTDEV to STDDEV_SAMP.
> The following query was used
> SELECT
> SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1)) AS MYSTDDEV,
> STDDEV(amount_sold) AS STDDEV,
> STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
> STDDEV_POP(amount_sold) AS STDDEV_POP
> from sales;
> The following is from running the above query on Hive where STDDEV --> STDDEV_POP which is incorrect
> +--------------------+---------------------+--------------------+---------------------+--+
> | mystddev | stddev | stddev_samp | stddev_pop |
> +--------------------+---------------------+--------------------+---------------------+--+
> | 260.7270919450411 | 260.72704617040444 | 260.7270722861465 | 260.72704617040444 |
> +--------------------+---------------------+--------------------+---------------------+--+
> The following is from Spark-sql where STDDEV --> STDDEV_SAMP which is correct
> +--------------------+---------------------+--------------------+---------------------+--+
> | mystddev | stddev | stddev_samp | stddev_pop |
> +--------------------+---------------------+--------------------+---------------------+--+
> | 260.7270919450411 | 260.7270722861637 | 260.7270722861637 | 260.72704617042166 |
> +--------------------+---------------------+--------------------+---------------------+--+
> Hopefully The Hive one will be corrected.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)