You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2021/02/22 10:03:00 UTC
[jira] [Updated] (ARROW-11727) [C++][FlightRPC] Use TDigest to
estimate latency quantiles in benchmark
[ https://issues.apache.org/jira/browse/ARROW-11727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yibo Cai updated ARROW-11727:
-----------------------------
Description:
In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.
Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.
- run flight-benchmark with default parameters
- calculate 0.99 quantile of latencies
- compare exact value (store all data points), value from tdigest, and value from boost
- test 5 rounds
{noformat}
Exact Tdigest Boost-P2
86 93 2130
175 235 1526
151 165 1926
147 153 302
251 313 561
{noformat}
[1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
was:
In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.
Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.
- run flight-benchmark with default parameters
- calculate 0.99 quantile of latencies
- compare exact value (store all data points), value from tdigest, and value from boost
{noformat}
Exact Tdigest Boost-P2
86 93 2130
175 235 1526
151 165 1926
147 153 302
251 313 561
{noformat}
[1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
> [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark
> -----------------------------------------------------------------------
>
> Key: ARROW-11727
> URL: https://issues.apache.org/jira/browse/ARROW-11727
> Project: Apache Arrow
> Issue Type: Improvement
> Components: FlightRPC
> Reporter: Yibo Cai
> Assignee: Yibo Cai
> Priority: Major
>
> In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.
> Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.
> - run flight-benchmark with default parameters
> - calculate 0.99 quantile of latencies
> - compare exact value (store all data points), value from tdigest, and value from boost
> - test 5 rounds
> {noformat}
> Exact Tdigest Boost-P2
> 86 93 2130
> 175 235 1526
> 151 165 1926
> 147 153 302
> 251 313 561
> {noformat}
> [1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
--
This message was sent by Atlassian Jira
(v8.3.4#803005)