You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/02/24 14:56:00 UTC
[jira] [Resolved] (ARROW-11727) [C++][FlightRPC] Use TDigest to
estimate latency quantiles in benchmark
[ https://issues.apache.org/jira/browse/ARROW-11727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Li resolved ARROW-11727.
------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
Issue resolved by pull request 9558
[https://github.com/apache/arrow/pull/9558]
> [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark
> -----------------------------------------------------------------------
>
> Key: ARROW-11727
> URL: https://issues.apache.org/jira/browse/ARROW-11727
> Project: Apache Arrow
> Issue Type: Improvement
> Components: FlightRPC
> Reporter: Yibo Cai
> Assignee: Yibo Cai
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.
> Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.
> - run flight-benchmark with default parameters
> - calculate 0.99 quantile of latencies
> - compare exact value (store all data points), value from tdigest, and value from boost
> - test 5 rounds
> {noformat}
> Exact Tdigest Boost-P2
> 86 93 2130
> 175 235 1526
> 151 165 1926
> 147 153 302
> 251 313 561
> {noformat}
> TDigest gives more accurate values for all quantiles. For 0.5 quantiles, both TDigest and Boost gives very accurate result. For 0.95 quantiles, TDigest gives almost exact value, Boost has a bit deviation.
> [1] [https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)