You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/02/24 14:56:00 UTC

[jira] [Resolved] (ARROW-11727) [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark

     [ https://issues.apache.org/jira/browse/ARROW-11727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Li resolved ARROW-11727.
------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 9558
[https://github.com/apache/arrow/pull/9558]

> [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark
> -----------------------------------------------------------------------
>
>                 Key: ARROW-11727
>                 URL: https://issues.apache.org/jira/browse/ARROW-11727
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: FlightRPC
>            Reporter: Yibo Cai
>            Assignee: Yibo Cai
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.
> Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.
>  - run flight-benchmark with default parameters
>  - calculate 0.99 quantile of latencies
>  - compare exact value (store all data points), value from tdigest, and value from boost
>  - test 5 rounds
> {noformat}
> Exact Tdigest Boost-P2
> 86    93      2130
> 175   235     1526
> 151   165     1926
> 147   153     302
> 251   313     561
> {noformat}
> TDigest gives more accurate values for all quantiles. For 0.5 quantiles, both TDigest and Boost gives very accurate result. For 0.95 quantiles, TDigest gives almost exact value, Boost has a bit deviation.
> [1] [https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)