You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/08/10 12:36:16 UTC

[GitHub] [incubator-doris] e0c9 opened a new issue #6419: Support exact percentile aggregate function

e0c9 opened a new issue #6419:
URL: https://github.com/apache/incubator-doris/issues/6419


   **Is your feature request related to a problem? Please describe.**
   Doris currently supports approximate percentage calculations, but there are some business scenarios that require accurate percentage calculation. Hive, Spark and Alicloud MaxCompute all support exact percentile aggregate.
   https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html
   https://help.aliyun.com/document_detail/48975.html#title-x4d-jao-van
   
   **Describe the solution you'd like**
   refer to: https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java
   1. calculate the cumulative number of occurrences of each value. `<Value, count>`
   > 19,2,1,1,7,5,7,9,9,1 => <1,3> <2,1> <5,1> <7,2> <9,2> <19,1>
   2. sort by value and calculate cumulative rank
   > <1,3> <2,4> <5,5> <7,7> <9,9> <19,10>
   3. Linear exploration to calculate the exact percentile (linear interpolation calculation if necessary)
   > percentile(value, 0.25)  = (3-2.25)*1 + (2.25 - 2)*2 = 1.25
   ```python
   import numpy as np
   a = np.array([1,1,1,2,5,7,7,9,9,19])
   print(np.percentile(a, 25))
   1.25
   ```
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman closed issue #6419: [Feature] Support exact percentile aggregate function

Posted by GitBox <gi...@apache.org>.
morningman closed issue #6419:
URL: https://github.com/apache/incubator-doris/issues/6419


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman closed issue #6419: [Feature] Support exact percentile aggregate function

Posted by GitBox <gi...@apache.org>.
morningman closed issue #6419:
URL: https://github.com/apache/incubator-doris/issues/6419


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org