You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Henry Robinson (Code Review)" <ge...@cloudera.org> on 2016/09/22 18:10:48 UTC

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4516

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................

IMPALA-4187: Switch RPC latency metrics to histograms

It's usually better to measure latency distributions with histograms,
not avg / min / max. This change switches out the metrics that measure
the latency of RPC processing time to histograms.

Add HistogramMetric::Reset() to allow histogram to remove all
entries. Added spinlock around histogram access.

Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
---
M be/src/rpc/rpc-trace.cc
M be/src/rpc/rpc-trace.h
M be/src/util/histogram-metric.h
3 files changed, 58 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/4516/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

Can you give an example of what the old and new metries look like?  Does CM need to be updated at all?

http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/util/histogram-metric.h
File be/src/util/histogram-metric.h:

Line 107:   // Protects histogram_
how about: Protects histogram_ pointer.
to make it clear that it's only the pointer, not the implementation that's being protected.


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/rpc/rpc-trace.cc
File be/src/rpc/rpc-trace.cc:

PS3, Line 179:       constexpr int32_t SIXTY_MINUTES_IN_MS = 60 * 1000 * 60;
> We'll see users with long running queries go over this (not uncommon) for h
I checked - any value > 60 minutes will be clipped to 60 minutes. That seems ok - beyond 60 minutes I'm not sure there's much use in knowing the distribution. 60 minutes might be too much since most RPCs will never get anywhere close, but the total data structure cost is 100KB and there are only a handful of them for all the RPCs we support.


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Juan Yu (Code Review)" <ge...@cloudera.org>.
Juan Yu has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 2:

What's the performance of updating this metrics?

-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 3:

(1 comment)

Added before and after to the commit msg.

http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/util/histogram-metric.h
File be/src/util/histogram-metric.h:

Line 107:   // Protects histogram_
> how about: Protects histogram_ pointer.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Juan Yu (Code Review)" <ge...@cloudera.org>.
Juan Yu has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/util/histogram-metric.h
File be/src/util/histogram-metric.h:

PS2, Line 70: id Update(int64_t val) {
> I'm not sure I follow - HdrHistogram is thread-safe (at least per the class
Nvm, looked at it more carefully again, and it is lock-free but thread safe.


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Juan Yu (Code Review)" <ge...@cloudera.org>.
Juan Yu has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/rpc/rpc-trace.cc
File be/src/rpc/rpc-trace.cc:

PS2, Line 181: SIXTY_MINUTES_IN_MS
> Sure, but what's the drawback of having 60 minutes be the upper bound?
More buckets and HdrHistogram::ValueAtPercentile() will take longer time. But we don't update metrics that often so it's ok.


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/rpc/rpc-trace.cc
File be/src/rpc/rpc-trace.cc:

PS3, Line 179:       constexpr int32_t SIXTY_MINUTES_IN_MS = 60 * 1000 * 60;
We'll see users with long running queries go over this (not uncommon) for hs2::ExecuteStatement/beeswax::query

I'm not sure it's worth doing anything about, but do you know how it will be handled?


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/rpc/rpc-trace.cc
File be/src/rpc/rpc-trace.cc:

PS3, Line 179:       constexpr int32_t SIXTY_MINUTES_IN_MS = 60 * 1000 * 60;
> I checked - any value > 60 minutes will be clipped to 60 minutes. That seem
wfm, thanks for checking.


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Juan Yu (Code Review)" <ge...@cloudera.org>.
Juan Yu has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/rpc/rpc-trace.cc
File be/src/rpc/rpc-trace.cc:

PS2, Line 181: SIXTY_MINUTES_IN_MS
If I understand correctly, this means the longest RPC call could be 60min. Isn't it too long? most of RPC should be in ms range or even shorter, 1 min is already abnormal, right?


http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/util/histogram-metric.h
File be/src/util/histogram-metric.h:

PS2, Line 37: highest_trackable_value, num_significant_digits
would it be worth to store those in HistogramMetric?


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 4: Code-Review+2

Carry +2, made Matt's suggested changes as well.

-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 2:

The core change to HistogramMetric (adding a Reset() method) is needed for our proposed solution to https://gerrit.cloudera.org/#/c/4500/.

Since I had this patch all ready to go, might as well get the RPC metric changes reviewed at the same time.

-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


IMPALA-4187: Switch RPC latency metrics to histograms

It's usually better to measure latency distributions with histograms,
not avg / min / max. This change switches out the metrics that measure
the latency of RPC processing time to histograms.

Add HistogramMetric::Reset() to allow histogram to remove all
entries. Added spinlock around histogram access.

On a 8-core i7 @ 3.4GHz, the following throughputs were observed:

1 thread -> 25M updates/sec
4 threads -> 7M updates/sec
16 threads -> 5M updates/sec

Each histogram takes ~108KB of storage for its buckets. This can be
reduced by reducing the maximum value, currently 60 minutes.

The new metrics have the following text representation:

Count: 148, 25th %-ile: 0, 50th %-ile: 0, 75th %-ile: 0, 90th %-ile: 0,
95th %-ile: 0, 99.9th %-ile: 1ms

Which changes from the old:

count: 345, last: 6ms, min: 0, max: 12ms, mean: 1ms, stddev: 1ms

Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Reviewed-on: http://gerrit.cloudera.org:8080/4516
Reviewed-by: Henry Robinson <he...@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/rpc/rpc-trace.cc
M be/src/rpc/rpc-trace.h
M be/src/util/histogram-metric.h
M common/thrift/metrics.json
4 files changed, 64 insertions(+), 37 deletions(-)

Approvals:
  Henry Robinson: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 2:

(8 comments)

Perf results are in the commit msg now.

http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/rpc/rpc-trace.cc
File be/src/rpc/rpc-trace.cc:

PS2, Line 179: int32_t
> uint32_t?
We typically don't use uint (not consistent in that, unfortunately).


PS2, Line 181: 3
> nit: I've previously learned that it's a good practice not to use "magic" c
Happy to add a comment.


PS2, Line 181: SIXTY_MINUTES_IN_MS
> If I understand correctly, this means the longest RPC call could be 60min. 
Sure, but what's the drawback of having 60 minutes be the upper bound?


http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/util/histogram-metric.h
File be/src/util/histogram-metric.h:

PS2, Line 37: highest_trackable_value, num_significant_digits
> would it be worth to store those in HistogramMetric?
What would be the benefit? They're already stored in HdrHistogram.


Line 44: 
> There's no shared state access before this line. Worth locking from here?
Done


Line 58: 
> And releasing the lock here?
Done


PS2, Line 70: histogram_->Increment(val);
> On an unrelated note, I looked through the implementation of HdrHistogram a
I'm not sure I follow - HdrHistogram is thread-safe (at least per the class comment). The reason we have locks here is to protect access to the pointer to the histogram object.


PS2, Line 107: boost::scoped_ptr<HdrHistogram> histogram_;
> I know we had a scoped_ptr vs unique_ptr discussion thread recently, but it
Yes, I think that was the result of the conversation so far. scoped_ptr for objects whose ownership will never change.


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 3:

> (1 comment)
 > 
 > Can you give an example of what the old and new metries look like? 
 > Does CM need to be updated at all?

CM doesn't collect these metrics yet (I checked their code before), but this reminds me we should update common/thrift/metrics.json

The rpc-method

  {
    "description": "Duration (ms) of RPC calls to $0",
    "contexts": [
      "CATALOGSERVER",
      "STATESTORE",
      "IMPALAD"
    ],
    "label": "$0 RPC Call Duration",
    "units": "TIME_MS",
    "kind": "STATS",
    "key": "rpc-method.$0.call_duration"
  },


STATS -> HISTOGRAM

I expected there to be a runtime check to catch this mistake, but it looks like we check the metric "type" in SimpleMetric and not in HistogramMetric.

The histogram metric constructor should

DCHECK_EQ(TMetricKind::HISTOGRAM, metric_def.kind);

-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/rpc/rpc-trace.cc
File be/src/rpc/rpc-trace.cc:

PS2, Line 179: int32_t
uint32_t?


PS2, Line 181: 3
nit: I've previously learned that it's a good practice not to use "magic" constants such as used here, or if it's absolutely necessary, a small comment should explain what the use of said constant is.

Is that something that's worth following?


http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/util/histogram-metric.h
File be/src/util/histogram-metric.h:

Line 44: 
There's no shared state access before this line. Worth locking from here?


Line 58: 
And releasing the lock here?


PS2, Line 70: histogram_->Increment(val);
On an unrelated note, I looked through the implementation of HdrHistogram and there seems to be use of a lot of atomic operations, but that doesn't make it race-free (which is why it makes sense to use a lock here).

Any idea why HdrHistogram is written that way? (Not super important. I was just curious; if you're not sure, we can always revisit it later)


PS2, Line 107: boost::scoped_ptr<HdrHistogram> histogram_;
I know we had a scoped_ptr vs unique_ptr discussion thread recently, but it doesn't seem like we reached a verdict.

It doesn't seem like we intend to move() or release() this pointer, but are we okay with increasing dependency on boost by adding new scoped_ptrs?


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Hello Juan Yu, Matthew Jacobs, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4516

to look at the new patch set (#4).

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................

IMPALA-4187: Switch RPC latency metrics to histograms

It's usually better to measure latency distributions with histograms,
not avg / min / max. This change switches out the metrics that measure
the latency of RPC processing time to histograms.

Add HistogramMetric::Reset() to allow histogram to remove all
entries. Added spinlock around histogram access.

On a 8-core i7 @ 3.4GHz, the following throughputs were observed:

1 thread -> 25M updates/sec
4 threads -> 7M updates/sec
16 threads -> 5M updates/sec

Each histogram takes ~108KB of storage for its buckets. This can be
reduced by reducing the maximum value, currently 60 minutes.

The new metrics have the following text representation:

Count: 148, 25th %-ile: 0, 50th %-ile: 0, 75th %-ile: 0, 90th %-ile: 0,
95th %-ile: 0, 99.9th %-ile: 1ms

Which changes from the old:

count: 345, last: 6ms, min: 0, max: 12ms, mean: 1ms, stddev: 1ms

Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
---
M be/src/rpc/rpc-trace.cc
M be/src/rpc/rpc-trace.h
M be/src/util/histogram-metric.h
M common/thrift/metrics.json
4 files changed, 64 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/4516/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has uploaded a new patch set (#3).

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
......................................................................

IMPALA-4187: Switch RPC latency metrics to histograms

It's usually better to measure latency distributions with histograms,
not avg / min / max. This change switches out the metrics that measure
the latency of RPC processing time to histograms.

Add HistogramMetric::Reset() to allow histogram to remove all
entries. Added spinlock around histogram access.

On a 8-core i7 @ 3.4GHz, the following throughputs were observed:

1 thread -> 25M updates/sec
4 threads -> 7M updates/sec
16 threads -> 5M updates/sec

Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
---
M be/src/rpc/rpc-trace.cc
M be/src/rpc/rpc-trace.h
M be/src/util/histogram-metric.h
3 files changed, 61 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/4516/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>