You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2021/02/22 09:42:00 UTC

[jira] [Created] (ARROW-11727) [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark

Yibo Cai created ARROW-11727:
--------------------------------

             Summary: [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark
                 Key: ARROW-11727
                 URL: https://issues.apache.org/jira/browse/ARROW-11727
             Project: Apache Arrow
          Issue Type: Improvement
          Components: FlightRPC
            Reporter: Yibo Cai
            Assignee: Yibo Cai


In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.

Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.

- run flight-benchmark with default parameters
- calculate 0.99 quantile of latencies
- compare exact value (store all data points), value from tdigest, and value from boost
{noformat}
Exact Tdigest Boost-P2
86    93      2130
175   235     1526
151   165     1926
147   153     302
251   313     561
{noformat}

[1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)