You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Xiaoxiang Yu (Jira)" <ji...@apache.org> on 2020/07/31 12:03:00 UTC

[jira] [Closed] (KYLIN-3361) Add a two layer udaf stddev_sum

     [ https://issues.apache.org/jira/browse/KYLIN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiaoxiang Yu closed KYLIN-3361.
-------------------------------

Resolved in release 3.1.0 (2020-07-03)

> Add a two layer udaf stddev_sum
> -------------------------------
>
>                 Key: KYLIN-3361
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3361
>             Project: Kylin
>          Issue Type: New Feature
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>            Priority: Major
>              Labels: UDAF
>             Fix For: v3.1.0
>
>
> (x ~1~ - +x+) ^2^ + (x ~2~ - +x+) ^2^ + ... + (x ~n~ - +x+) ^2^ = x ~1~ ^2^ + x ~2~ ^2^ + ... + x ~n~ ^2^ - n +x+ ^2^, where +x+ is the average of x ~1~, x ~2~, ..., x ~n~. Therefore, to compute stddev, what kylin need to do is to pre-calculate sum(x ~i~ ^2^), sum(x ~i~) and count
>  
> var(X) = E(X ^2^) - E(X) ^2^
> var ^'^(X) = n * var(X)
>              = n*(E(X ^2^) - E(X) ^2^)
>              = S(X ^2^) - S(X) ^2^/n
>              = S(X ~1~ ^2^) + S(X ~2~ ^2^) - S(X ~1~ + X ~2~) ^2^/(n ~1~ + n ~2~)
>              = S(X ~1~ ^2^) - S(X ~1~ ) ^2^/(n ~1~) + S(X ~2~ ^2^) - S(X ~2~) ^2^/(n ~2~)  + S(X ~1~) ^2^/(n ~1~) + S(X ~2~) ^2^/(n ~2~) - S(X ~1~ + X ~2~) ^2^/(n ~1~ + n ~2~)
>              = var ^'^(X ~1~) + var ^'^(X ~2~) + (S(X ~2~)n ~1~ -  S(X ~1~)n ~2~) ^2^ / (n ~1~ n ~2~ (n ~1~ + n ~2~))
>              = var ^'^(X ~1~) + var ^'^(X ~2~) + (S(X ~2~) - S(X ~1~) n ~2~ / n ~1~ ) ^2^ n ~1~ / (n ~2~ (n ~1~ + n ~2~))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)