You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/25 07:12:48 UTC

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #2082: 2078 docs for approx functions

xudong963 commented on a change in pull request #2082:
URL: https://github.com/apache/arrow-datafusion/pull/2082#discussion_r835006509



##########
File path: docs/source/user-guide/sql/aggregate_functions.md
##########
@@ -0,0 +1,69 @@
+# Aggregate Functions
+
+Aggregate functions operate on a set of values to compute a single result.
+
+## General
+
+### min
+`min(x) -> x` returns the minimum value of all input values.
+
+### max
+`max(x) -> x` returns the maximum value of all input values.
+
+### count
+`count(1) -> uint64` returns the number of input values.
+
+`count(*) -> uint64` returns the number of input values.
+
+`count(x) -> uint64` returns the number of non-null input values.
+
+`count(distinct x) -> uint64` returns the number of non-null distinct input values.
+
+### avg
+`avg(x) -> float64` returns the average (arithmetic mean) of input values.
+
+### sum
+`sum(x) -> same as x` returns the sum of all input values.
+
+### array_agg
+`array_agg(x) -> array<x>` returns an array created from the input values.
+
+## Approximate
+
+### approx_distinct
+`approx_distinct(x) -> uint64` returns the approximate number (HyperLogLog) of distinct input values
+
+### approx_median
+`approx_median(x) -> x` returns the approximate median of input values.
+
+it is same as `approx_percentile_cont(x, 0.5)`.
+
+### approx_percentile_cont
+`approx_percentile_cont(x, p) -> x` return the approximate percentile (TDigest) of input values, where `p` is a float64 between 0 and 1 (inclusive).
+
+it supports raw data as input and build Tdigest sketches during query time.
+
+### approx_percentile_cont_with_weight
+`approx_percentile_cont_with_weight(x, w, p) -> x` return the approximate percentile (TDigest) of input values with weight, where `w` is weight column expression and `p` is a float64 between 0 and 1 (inclusive).
+
+it supports raw data as input or pre-aggregated TDigest sketches (mean and weight), then build and merge Tdigest sketches during query time.
+it is suitable for low latency query OLAP system where `Spark Streaming/Flink` pre-aggregate data to a data store, then query by Datafusion.
+
+## Statistical

Review comment:
       WIP?

##########
File path: docs/source/user-guide/sql/sql_status.md
##########
@@ -76,6 +76,9 @@
   - [x] nullif
 - Approximation functions
   - [x] approx_distinct
+  - [x] approx_median

Review comment:
       👍🏻




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org