You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/10/06 16:31:17 UTC

[GitHub] [orc] guiyanakuang commented on pull request #915: ORC-98: Add support for t-digests to ORC

guiyanakuang commented on pull request #915:
URL: https://github.com/apache/orc/pull/915#issuecomment-936629718


   I have completed a benchmark test (dba9a1a) using the current implementation for the time being. To show the benefits of custom statistics.
   Added OptimizeFilterBenchmark. Tested the performance of the default query, and of the filter condition with the base percentage of filter values re-ordered by TDigest.
   
   proportion: Ratio of cardinal number between columns
   quota: Minimum cardinal number
   
   ```
   Benchmark                            (proportion)  (quota)  Mode  Cnt     Score     Error  Units
   OptimizeFilterBenchmark.noUseTDigest             2       10  avgt   20  1052.305 ±  10.632  us/op
   OptimizeFilterBenchmark.noUseTDigest             2      100  avgt   20  1109.375 ±  10.162  us/op
   OptimizeFilterBenchmark.noUseTDigest             2     1000  avgt   20  1173.790 ±  11.696  us/op
   OptimizeFilterBenchmark.noUseTDigest             3       10  avgt   20  1056.139 ±   8.359  us/op
   OptimizeFilterBenchmark.noUseTDigest             3      100  avgt   20  1154.665 ±   9.152  us/op
   OptimizeFilterBenchmark.noUseTDigest             3     1000  avgt   20  1168.113 ±   9.115  us/op
   OptimizeFilterBenchmark.useTDigest               2       10  avgt   20  1116.076 ±   6.330  us/op
   OptimizeFilterBenchmark.useTDigest               2      100  avgt   20  1162.956 ±   9.865  us/op
   OptimizeFilterBenchmark.useTDigest               2     1000  avgt   20  1220.028 ±  22.544  us/op
   OptimizeFilterBenchmark.useTDigest               3       10  avgt   20  1114.617 ±  10.220  us/op
   OptimizeFilterBenchmark.useTDigest               3      100  avgt   20  1219.488 ± 138.798  us/op
   OptimizeFilterBenchmark.useTDigest               3     1000  avgt   20   651.001 ±  20.784  us/op
   ```
   
   Tests show some performance loss in reading custom statistics structures, around 100 us/op. When there are sparse values in the filter conditions, reordering the filter conditions results in a larger performance gain, as this allows for early pruning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org