You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2021/02/22 10:03:00 UTC

[jira] [Updated] (ARROW-11727) [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark

     [ https://issues.apache.org/jira/browse/ARROW-11727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yibo Cai updated ARROW-11727:
-----------------------------
    Description: 
In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.

Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.

- run flight-benchmark with default parameters
- calculate 0.99 quantile of latencies
- compare exact value (store all data points), value from tdigest, and value from boost
- test 5 rounds
{noformat}
Exact Tdigest Boost-P2
86    93      2130
175   235     1526
151   165     1926
147   153     302
251   313     561
{noformat}

[1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf

  was:
In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.

Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.

- run flight-benchmark with default parameters
- calculate 0.99 quantile of latencies
- compare exact value (store all data points), value from tdigest, and value from boost
{noformat}
Exact Tdigest Boost-P2
86    93      2130
175   235     1526
151   165     1926
147   153     302
251   313     561
{noformat}

[1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf


> [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark
> -----------------------------------------------------------------------
>
>                 Key: ARROW-11727
>                 URL: https://issues.apache.org/jira/browse/ARROW-11727
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: FlightRPC
>            Reporter: Yibo Cai
>            Assignee: Yibo Cai
>            Priority: Major
>
> In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.
> Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.
> - run flight-benchmark with default parameters
> - calculate 0.99 quantile of latencies
> - compare exact value (store all data points), value from tdigest, and value from boost
> - test 5 rounds
> {noformat}
> Exact Tdigest Boost-P2
> 86    93      2130
> 175   235     1526
> 151   165     1926
> 147   153     302
> 251   313     561
> {noformat}
> [1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)