You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@kudu.apache.org by "Grant Henke (Code Review)" <ge...@cloudera.org> on 2020/02/20 17:14:12 UTC

[kudu-CR] KUDU-3056: Reduce HdrHistogramAccumulator overhead

Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15254


Change subject: KUDU-3056: Reduce HdrHistogramAccumulator overhead
......................................................................

KUDU-3056: Reduce HdrHistogramAccumulator overhead

This patch makes a few changes to reduce the overhead of the
HdrHistogramAccumulator.

It changes from using `SynchronizedHistogram` (value type
long) to using `IntCountsHistogram` (value type int).
This significantly reduces the data footprint of the histogram and is
safe given write durations will never exceed `Integer.MAX_VALUE`.
Because thread safety is still important we syncronize all access
to `IntCountsHistogram` in `HistogramWrapper`.

It also adjust the `HistogramWrapper` to lazily instantiate an
`IntCountsHistogram`. This means that if no values are recorded,
the overhead of the `HdrHistogramAccumulator` should be almost
zero.

Last it reduces the `numberOfSignificantValueDigits` tracked
in the histogram from 3 to 2. The result is relatively similar
output in the Spark accumulator with a significantly smaller
histogram.

I tested each variant using `getEstimatedFootprintInBytes()` and
the result is that the new implimentation is 90% smaller when the
HdrHistogramAccumulator is used. The new implementation
is 100% smaller when not no values are stored:

long w/ precision 3 & max 30000ms: 49664 (current)
long w/ precision 2 & max 30000ms: 9728
long w/ precision 1 & max 30000ms: 2048
int  w/ precision 3 & max 30000ms: 25088
int  w/ precision 2 & max 30000ms: 5120 (new)
int  w/ precision 1 & max 30000ms: 1280

Note: I used a max of 30000ms in these calculations because that
is the default operation timeout

Below is sample string output from before and after this patch
generated with 1000 random values between 0ms and 500ms.

Before:
0.2%: 0ms, 50.3%: 265ms, 75.1%: 376ms, 87.5%: 437ms, 93.8%: 470ms, 96.9%: 484ms, 98.6%: 493ms, 99.5%: 496ms, 99.8%: 498ms, 100.0%: 499ms, 100.0%: 499ms

After:
0.2%: 0ms, 50.3%: 265ms, 75.4%: 377ms, 87.5%: 437ms, 93.9%: 471ms, 97.3%: 485ms, 98.6%: 493ms, 99.5%: 497ms, 100.0%: 499ms, 100.0%: 499ms

Note: I used the same seed to generate the same values for both strings.

Change-Id: Ic7c2a33bc61a2baa38703ea3340a07e06ab39db3
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
2 files changed, 68 insertions(+), 29 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/15254/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic7c2a33bc61a2baa38703ea3340a07e06ab39db3
Gerrit-Change-Number: 15254
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <gr...@apache.org>