You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/10 14:39:45 UTC

[GitHub] [arrow-datafusion] domodwyer opened a new issue #1538: Quantile support

domodwyer opened a new issue #1538:
URL: https://github.com/apache/arrow-datafusion/issues/1538


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I would like to efficiently aggregate (approximate) quantile values from a column of data - "show me the 99th percentile of the latency column in the requests table"
   
   **Describe the solution you'd like**
   Implement TDigest (or similar algorithm) to provide relatively cheap quantile values/estimations.
   
   **Describe alternatives you've considered**
   I've had a look at some other DBs:
   
   * duckdb - tdigest & reservoir sampling
   * timescaledb - tdigest & uddsketch
   * snowflake - several options, including tdigest for cheap approximations
   * presto - qdigest
   * influxdb - tdigest
   
   For approximate results, tdigest seems popular, though the uddsketch paper is relatively new and also interesting.
   
   **Additional context**
   Tdigest provides quantile estimatations, I imagine it would expose an `approx_quantile(column, quantile)` aggregation keeping with the naming of the `approx_distinct()` aggregation.
   
   Example:
   
   ```sql
   SELECT approx_quantile(latency, 0.99) AS p99 FROM requests;
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #1538: Quantile support

Posted by GitBox <gi...@apache.org>.
alamb closed issue #1538:
URL: https://github.com/apache/arrow-datafusion/issues/1538


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org