You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/12/11 03:45:42 UTC

[kudu-CR] [server] add 'uptime' metric for server

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16858


Change subject: [server] add 'uptime' metric for server
......................................................................

[server] add 'uptime' metric for server

This patch adds a new 'uptime' metric for a Kudu server.  The metric's
value is reported as the duration of the time interval passed from the
start of the server.  The interval duration is reported in microseconds
as many other time-related metrics.

Initially I thought to reuse already existing field that stores server's
start time as wall clock time in seconds, but such approach would not
produce a reliable result given that the wall clock time can be set any
time by a super user, hence a wall-clock-based delta could not reliably
provide the actual uptime of a server.

The motivation for this patch is the realization that sometimes it's not
easy to interpret metric values reported for a server without knowing
its uptime.

I didn't add any tests for the newly introduced metric, but I manually
verified that both kudu-master and kudu-tserver processes report uptime
among their metrics at the "/metrics" HTTP endpoint.

Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
---
M src/kudu/integration-tests/registration-test.cc
M src/kudu/master/master.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tserver/heartbeater.cc
5 files changed, 31 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/16858/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h
File src/kudu/server/server_base.h:

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h@185
PS1, Line 185:   int64_t start_walltime_;
> Is there a reason you changed the meaning of the original start time instea
I didn't change the meaning -- start_time_ originally was wall time, as it is now.



-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Dec 2020 17:06:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16858/3/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

http://gerrit.cloudera.org:8080/#/c/16858/3/src/kudu/server/server_base.cc@422
PS3, Line 422: return (MonoTime::Now() - this->start_time())
> nit: would it make sense to DCHECK that start_time has been set?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 12 Dec 2020 02:46:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16858

to look at the new patch set (#3).

Change subject: [server] add 'uptime' metric for server
......................................................................

[server] add 'uptime' metric for server

This patch adds a new 'uptime' metric for a Kudu server.  The metric's
value is reported as the duration of the time interval passed from the
start of the server.  The interval duration is reported in microseconds
as many other time-related metrics.

Initially I thought to reuse already existing field that stores server's
start time as wall clock time in seconds, but such approach would not
produce a reliable result given that the wall clock time can be set any
time by a super user, hence a wall-clock-based delta could not reliably
provide the actual uptime of a server.

I also did a minor refactoring on BaseServer::Init() and
BaseServer::Start() methods, moving some steps from Init() to Start()
which are belonging more to Start() rather than Init().

The motivation for this patch is the realization that sometimes it's not
easy to interpret metric values reported for a server without knowing
its uptime.

I didn't add any tests for the newly introduced metric, but I manually
verified that both kudu-master and kudu-tserver processes report uptime
among their metrics at the "/metrics" HTTP endpoint.

Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
---
M src/kudu/common/wire_protocol.proto
M src/kudu/integration-tests/registration-test.cc
M src/kudu/master/master.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tserver/heartbeater.cc
6 files changed, 54 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/16858/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 12 Dec 2020 04:32:41 +0000
Gerrit-HasComments: No

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16858/3/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

http://gerrit.cloudera.org:8080/#/c/16858/3/src/kudu/server/server_base.cc@422
PS3, Line 422: return (MonoTime::Now() - this->start_time())
nit: would it make sense to DCHECK that start_time has been set?



-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 12 Dec 2020 02:26:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed a vote on this change.

Change subject: [server] add 'uptime' metric for server
......................................................................


Removed Code-Review+2 by Andrew Wong <aw...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h
File src/kudu/server/server_base.h:

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h@185
PS1, Line 185:   int64_t start_walltime_;
> Right but the variable now called "start_time_" means something else.
That's why this patch required changing all the locations start_time_ was previously used.



-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Dec 2020 17:11:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h
File src/kudu/server/server_base.h:

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h@185
PS1, Line 185:   int64_t start_walltime_;
> I didn't change the meaning -- start_time_ originally was wall time, as it 
Right but the variable now called "start_time_" means something else.



-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Dec 2020 17:10:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Grant Henke, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16858

to look at the new patch set (#5).

Change subject: [server] add 'uptime' metric for server
......................................................................

[server] add 'uptime' metric for server

This patch adds a new 'uptime' metric for a Kudu server.  The metric's
value is reported as the duration of the time interval passed from the
start of the server.  The interval duration is reported in microseconds
as many other time-related metrics.

Initially I thought to reuse already existing field that stores server's
start time as wall clock time in seconds, but such approach would not
produce a reliable result given that the wall clock time can be set any
time by a super user, hence a wall-clock-based delta could not reliably
provide the actual uptime of a server.

I also did a minor refactoring on BaseServer::Init() and
BaseServer::Start() methods, moving some steps from Init() to Start()
which are belonging more to Start() rather than Init().

The motivation for this patch is the realization that sometimes it's not
easy to interpret metric values reported for a server without knowing
its uptime.

I didn't add any tests for the newly introduced metric, but I manually
verified that both kudu-master and kudu-tserver processes report uptime
among their metrics at the "/metrics" HTTP endpoint.

Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
---
M src/kudu/common/wire_protocol.proto
M src/kudu/integration-tests/registration-test.cc
M src/kudu/master/master.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tserver/heartbeater.cc
6 files changed, 58 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/16858/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h
File src/kudu/server/server_base.h:

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h@185
PS1, Line 185:   int64_t start_walltime_;
Is there a reason you changed the meaning of the original start time instead of defining a new variable for the new time like start_monotime_?


http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h@187
PS1, Line 187: monotinic
typo?



-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Dec 2020 15:46:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Grant Henke, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16858

to look at the new patch set (#4).

Change subject: [server] add 'uptime' metric for server
......................................................................

[server] add 'uptime' metric for server

This patch adds a new 'uptime' metric for a Kudu server.  The metric's
value is reported as the duration of the time interval passed from the
start of the server.  The interval duration is reported in microseconds
as many other time-related metrics.

Initially I thought to reuse already existing field that stores server's
start time as wall clock time in seconds, but such approach would not
produce a reliable result given that the wall clock time can be set any
time by a super user, hence a wall-clock-based delta could not reliably
provide the actual uptime of a server.

I also did a minor refactoring on BaseServer::Init() and
BaseServer::Start() methods, moving some steps from Init() to Start()
which are belonging more to Start() rather than Init().

The motivation for this patch is the realization that sometimes it's not
easy to interpret metric values reported for a server without knowing
its uptime.

I didn't add any tests for the newly introduced metric, but I manually
verified that both kudu-master and kudu-tserver processes report uptime
among their metrics at the "/metrics" HTTP endpoint.

Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
---
M src/kudu/common/wire_protocol.proto
M src/kudu/integration-tests/registration-test.cc
M src/kudu/master/master.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tserver/heartbeater.cc
6 files changed, 57 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/16858/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................

[server] add 'uptime' metric for server

This patch adds a new 'uptime' metric for a Kudu server.  The metric's
value is reported as the duration of the time interval passed from the
start of the server.  The interval duration is reported in microseconds
as many other time-related metrics.

Initially I thought to reuse already existing field that stores server's
start time as wall clock time in seconds, but such approach would not
produce a reliable result given that the wall clock time can be set any
time by a super user, hence a wall-clock-based delta could not reliably
provide the actual uptime of a server.

I also did a minor refactoring on BaseServer::Init() and
BaseServer::Start() methods, moving some steps from Init() to Start()
which are belonging more to Start() rather than Init().

The motivation for this patch is the realization that sometimes it's not
easy to interpret metric values reported for a server without knowing
its uptime.

I didn't add any tests for the newly introduced metric, but I manually
verified that both kudu-master and kudu-tserver processes report uptime
among their metrics at the "/metrics" HTTP endpoint.

Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Reviewed-on: http://gerrit.cloudera.org:8080/16858
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/common/wire_protocol.proto
M src/kudu/integration-tests/registration-test.cc
M src/kudu/master/master.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tserver/heartbeater.cc
6 files changed, 58 insertions(+), 27 deletions(-)

Approvals:
  Andrew Wong: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16858

to look at the new patch set (#2).

Change subject: [server] add 'uptime' metric for server
......................................................................

[server] add 'uptime' metric for server

This patch adds a new 'uptime' metric for a Kudu server.  The metric's
value is reported as the duration of the time interval passed from the
start of the server.  The interval duration is reported in microseconds
as many other time-related metrics.

Initially I thought to reuse already existing field that stores server's
start time as wall clock time in seconds, but such approach would not
produce a reliable result given that the wall clock time can be set any
time by a super user, hence a wall-clock-based delta could not reliably
provide the actual uptime of a server.

The motivation for this patch is the realization that sometimes it's not
easy to interpret metric values reported for a server without knowing
its uptime.

I didn't add any tests for the newly introduced metric, but I manually
verified that both kudu-master and kudu-tserver processes report uptime
among their metrics at the "/metrics" HTTP endpoint.

Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
---
M src/kudu/common/wire_protocol.proto
M src/kudu/integration-tests/registration-test.cc
M src/kudu/master/master.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tserver/heartbeater.cc
6 files changed, 43 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/16858/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 4:

> > Looks like the test failure is relevant:
 > >
 > > F1212 02:58:27.233850 17340 server_base.h:120] Check failed:
 > > start_time_ > MonoTime::Min()
 > > *** Check failure stack trace: ***
 > > @     0x7ff90413788d  google::LogMessage::Fail() at ??:0
 > > @     0x7ff904139862  google::LogMessage::SendToLog() at ??:0
 > > @     0x7ff9041373ad  google::LogMessage::Flush() at ??:0
 > > @     0x7ff90413a279  google::LogMessageFatal::~LogMessageFatal()
 > > at ??:0
 > > @     0x7ff902954ee2  kudu::server::ServerBase::start_time() at
 > > ??:0
 > > @     0x7ff90294caa1  _ZZN4kudu6server10ServerBaseC4ENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_17ServerBaseOptionsERKS7_ENKUlvE_clEv
 > > at ??:0
 > > @     0x7ff902952405  _ZNSt17_Function_handlerIFlvEZN4kudu6server10ServerBaseC4ENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS2_17ServerBaseOptionsERKS9_EUlvE_E9_M_invokeERKSt9_Any_data
 > > at ??:0
 > > @     0x7ff90295d140  std::function<>::operator()() at ??:0
 > > @     0x7ff90295bc05  kudu::FunctionGauge<>::value() at ??:0
 > > @     0x7ff902959d46  kudu::FunctionGauge<>::DetachToCurrentValue()
 > > at ??:0
 > > @     0x7ff90295779f  _ZZN4kudu13FunctionGaugeIlE21AutoDetachToLastValueEPNS_21FunctionGaugeDetacherEENKUlvE_clEv
 > > at ??:0
 > > @     0x7ff90295bd8b  _ZNSt17_Function_handlerIFvvEZN4kudu13FunctionGaugeIlE21AutoDetachToLastValueEPNS1_21FunctionGaugeDetacherEEUlvE_E9_M_invokeERKSt9_Any_data
 > > at ??:0
 > > @     0x7ff9050364a4  std::function<>::operator()() at ??:0
 > > @     0x7ff904accca8  kudu::FunctionGaugeDetacher::~FunctionGaugeDetacher()
 > > at ??:0
 > > @     0x7ff90294dae8  kudu::server::ServerBase::~ServerBase() at
 > > ??:0
 > > @     0x7ff904fdee0a  kudu::kserver::KuduServer::~KuduServer() at
 > > ??:0
 > > @     0x7ff904fd8c7a  kudu::master::Master::~Master() at ??:0
 > > @     0x7ff904ffeef2  kudu::master::RunMasterServer() at ??:0
 > > @     0x5561fd8365ef  kudu::master::MasterMain() at ??:0
 > > @     0x5561fd8367f5  main at ??:0
 > > @     0x7ff9035b6bf7  __libc_start_main at ??:0
 > > @     0x5561fd83621a  _start at ??:0
 > > /home/jenkins-slave/workspace/kudu-master/0/src/kudu/integration-tests/security-faults-itest.cc:194:
 > > Failure
 > > Value of: s.ToString()
 > 
 > Right, and that's a sign that we don't need DCHECK there :)

The issue there is that in that test scenario the metrics detacher tries to capture the value of the metric before start time is assigned.


-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 12 Dec 2020 04:19:15 +0000
Gerrit-HasComments: No

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h
File src/kudu/server/server_base.h:

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h@185
PS1, Line 185:   int64_t start_walltime_;
> Ah, I guess the question then was for the naming, right?
Got it, I didn't understand that this makes the naming uniform with the other usages/instances of time.



-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Dec 2020 17:40:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 12 Dec 2020 04:02:50 +0000
Gerrit-HasComments: No

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 4:

> Looks like the test failure is relevant:
 > 
 > F1212 02:58:27.233850 17340 server_base.h:120] Check failed:
 > start_time_ > MonoTime::Min()
 > *** Check failure stack trace: ***
 > @     0x7ff90413788d  google::LogMessage::Fail() at ??:0
 > @     0x7ff904139862  google::LogMessage::SendToLog() at ??:0
 > @     0x7ff9041373ad  google::LogMessage::Flush() at ??:0
 > @     0x7ff90413a279  google::LogMessageFatal::~LogMessageFatal()
 > at ??:0
 > @     0x7ff902954ee2  kudu::server::ServerBase::start_time() at
 > ??:0
 > @     0x7ff90294caa1  _ZZN4kudu6server10ServerBaseC4ENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_17ServerBaseOptionsERKS7_ENKUlvE_clEv
 > at ??:0
 > @     0x7ff902952405  _ZNSt17_Function_handlerIFlvEZN4kudu6server10ServerBaseC4ENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS2_17ServerBaseOptionsERKS9_EUlvE_E9_M_invokeERKSt9_Any_data
 > at ??:0
 > @     0x7ff90295d140  std::function<>::operator()() at ??:0
 > @     0x7ff90295bc05  kudu::FunctionGauge<>::value() at ??:0
 > @     0x7ff902959d46  kudu::FunctionGauge<>::DetachToCurrentValue()
 > at ??:0
 > @     0x7ff90295779f  _ZZN4kudu13FunctionGaugeIlE21AutoDetachToLastValueEPNS_21FunctionGaugeDetacherEENKUlvE_clEv
 > at ??:0
 > @     0x7ff90295bd8b  _ZNSt17_Function_handlerIFvvEZN4kudu13FunctionGaugeIlE21AutoDetachToLastValueEPNS1_21FunctionGaugeDetacherEEUlvE_E9_M_invokeERKSt9_Any_data
 > at ??:0
 > @     0x7ff9050364a4  std::function<>::operator()() at ??:0
 > @     0x7ff904accca8  kudu::FunctionGaugeDetacher::~FunctionGaugeDetacher()
 > at ??:0
 > @     0x7ff90294dae8  kudu::server::ServerBase::~ServerBase() at
 > ??:0
 > @     0x7ff904fdee0a  kudu::kserver::KuduServer::~KuduServer() at
 > ??:0
 > @     0x7ff904fd8c7a  kudu::master::Master::~Master() at ??:0
 > @     0x7ff904ffeef2  kudu::master::RunMasterServer() at ??:0
 > @     0x5561fd8365ef  kudu::master::MasterMain() at ??:0
 > @     0x5561fd8367f5  main at ??:0
 > @     0x7ff9035b6bf7  __libc_start_main at ??:0
 > @     0x5561fd83621a  _start at ??:0
 > /home/jenkins-slave/workspace/kudu-master/0/src/kudu/integration-tests/security-faults-itest.cc:194:
 > Failure
 > Value of: s.ToString()

Right, and that's a sign that we don't need DCHECK there :)


-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 12 Dec 2020 04:17:41 +0000
Gerrit-HasComments: No

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h
File src/kudu/server/server_base.h:

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h@185
PS1, Line 185:   int64_t start_walltime_;
> That's why this patch required changing all the locations start_time_ was p
Ah, I guess the question then was for the naming, right?

As for the naming, you can see that elsewhere in the code we don't use walltime, always using either monotime or timestamps.  Also, few other objects store MonoTime as start time and have corresponding names for their accessors, for example:
  * OpDriver::start_time()
  * Scanner::start_time()

This patch just follows the suite.



-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Dec 2020 17:27:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h
File src/kudu/server/server_base.h:

http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h@185
PS1, Line 185:   int64_t start_walltime_;
> Got it, I didn't understand that this makes the naming uniform with the oth
Yup, I think having more uniform naming is better.


http://gerrit.cloudera.org:8080/#/c/16858/1/src/kudu/server/server_base.h@187
PS1, Line 187: monotinic
> typo?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Dec 2020 19:38:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] [server] add 'uptime' metric for server

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16858 )

Change subject: [server] add 'uptime' metric for server
......................................................................


Patch Set 4:

Looks like the test failure is relevant:

F1212 02:58:27.233850 17340 server_base.h:120] Check failed: start_time_ > MonoTime::Min() 
*** Check failure stack trace: ***
    @     0x7ff90413788d  google::LogMessage::Fail() at ??:0
    @     0x7ff904139862  google::LogMessage::SendToLog() at ??:0
    @     0x7ff9041373ad  google::LogMessage::Flush() at ??:0
    @     0x7ff90413a279  google::LogMessageFatal::~LogMessageFatal() at ??:0
    @     0x7ff902954ee2  kudu::server::ServerBase::start_time() at ??:0
    @     0x7ff90294caa1  _ZZN4kudu6server10ServerBaseC4ENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_17ServerBaseOptionsERKS7_ENKUlvE_clEv at ??:0
    @     0x7ff902952405  _ZNSt17_Function_handlerIFlvEZN4kudu6server10ServerBaseC4ENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS2_17ServerBaseOptionsERKS9_EUlvE_E9_M_invokeERKSt9_Any_data at ??:0
    @     0x7ff90295d140  std::function<>::operator()() at ??:0
    @     0x7ff90295bc05  kudu::FunctionGauge<>::value() at ??:0
    @     0x7ff902959d46  kudu::FunctionGauge<>::DetachToCurrentValue() at ??:0
    @     0x7ff90295779f  _ZZN4kudu13FunctionGaugeIlE21AutoDetachToLastValueEPNS_21FunctionGaugeDetacherEENKUlvE_clEv at ??:0
    @     0x7ff90295bd8b  _ZNSt17_Function_handlerIFvvEZN4kudu13FunctionGaugeIlE21AutoDetachToLastValueEPNS1_21FunctionGaugeDetacherEEUlvE_E9_M_invokeERKSt9_Any_data at ??:0
    @     0x7ff9050364a4  std::function<>::operator()() at ??:0
    @     0x7ff904accca8  kudu::FunctionGaugeDetacher::~FunctionGaugeDetacher() at ??:0
    @     0x7ff90294dae8  kudu::server::ServerBase::~ServerBase() at ??:0
    @     0x7ff904fdee0a  kudu::kserver::KuduServer::~KuduServer() at ??:0
    @     0x7ff904fd8c7a  kudu::master::Master::~Master() at ??:0
    @     0x7ff904ffeef2  kudu::master::RunMasterServer() at ??:0
    @     0x5561fd8365ef  kudu::master::MasterMain() at ??:0
    @     0x5561fd8367f5  main at ??:0
    @     0x7ff9035b6bf7  __libc_start_main at ??:0
    @     0x5561fd83621a  _start at ??:0
/home/jenkins-slave/workspace/kudu-master/0/src/kudu/integration-tests/security-faults-itest.cc:194: Failure
Value of: s.ToString()


-- 
To view, visit http://gerrit.cloudera.org:8080/16858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb362a71c625c05a2185b95ec24d3f371dcb4155
Gerrit-Change-Number: 16858
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 12 Dec 2020 04:04:35 +0000
Gerrit-HasComments: No