You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2020/09/23 04:16:00 UTC

[jira] [Created] (ARROW-10070) [C++][Compute] Implement stdev aggregate kernel

Yibo Cai created ARROW-10070:
--------------------------------

             Summary: [C++][Compute] Implement stdev aggregate kernel
                 Key: ARROW-10070
                 URL: https://issues.apache.org/jira/browse/ARROW-10070
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
            Reporter: Yibo Cai
            Assignee: Yibo Cai


To calculate standard deviation of an array or chunked array.

I would prefer two pass algorithm [1] as a balance of numerical stability and performance. Numpy uses this method to calculate variance [2].
Welford's online algorithm [3] is more stable, but also more expensive in computation.

[1] https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Two-pass_algorithm
[2] https://github.com/numpy/numpy/blob/92ebe1e9a6aeb47a881a1226b08218175776f9ea/numpy/core/_methods.py#L176
[3] https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm



--
This message was sent by Atlassian Jira
(v8.3.4#803005)