You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/11/02 01:45:19 UTC

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Hello Alexey Serbin,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/8450

to review the following change.


Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
---
M src/kudu/clock/system_ntp.cc
M src/kudu/clock/system_ntp.h
2 files changed, 19 insertions(+), 48 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/8450/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8450 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Reviewed-on: http://gerrit.cloudera.org:8080/8450
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/clock/system_ntp.cc
M src/kudu/clock/system_ntp.h
2 files changed, 21 insertions(+), 53 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8450 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8450/5/src/kudu/clock/system_ntp.cc
File src/kudu/clock/system_ntp.cc:

http://gerrit.cloudera.org:8080/#/c/8450/5/src/kudu/clock/system_ntp.cc@49
PS5, Line 49: ntp_adjtime
> nit: ntp_adjtime
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 03 Nov 2017 00:04:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, David Ribeiro Alves, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8450

to look at the new patch set (#6).

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
---
M src/kudu/clock/system_ntp.cc
M src/kudu/clock/system_ntp.h
2 files changed, 21 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/8450/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/8450 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, David Ribeiro Alves, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8450

to look at the new patch set (#5).

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
---
M src/kudu/clock/system_ntp.cc
M src/kudu/clock/system_ntp.h
2 files changed, 20 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/8450/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8450 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8450/5/src/kudu/clock/system_ntp.cc
File src/kudu/clock/system_ntp.cc:

http://gerrit.cloudera.org:8080/#/c/8450/5/src/kudu/clock/system_ntp.cc@49
PS5, Line 49: ntp_gettime
nit: ntp_adjtime



-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 02 Nov 2017 21:32:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8450 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 03 Nov 2017 05:11:28 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, David Ribeiro Alves, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8450

to look at the new patch set (#3).

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
---
M src/kudu/clock/system_ntp.cc
M src/kudu/clock/system_ntp.h
2 files changed, 20 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/8450/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, David Ribeiro Alves, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8450

to look at the new patch set (#4).

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
---
M src/kudu/clock/hybrid_clock-test.cc
M src/kudu/clock/hybrid_clock.cc
M src/kudu/clock/hybrid_clock.h
M src/kudu/clock/system_ntp.cc
M src/kudu/clock/system_ntp.h
5 files changed, 115 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/8450/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8450 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 03 Nov 2017 05:11:10 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, David Ribeiro Alves, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8450

to look at the new patch set (#2).

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
......................................................................

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
---
M src/kudu/clock/system_ntp.cc
M src/kudu/clock/system_ntp.h
2 files changed, 20 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/8450/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8450
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa9241dea1304076ed5cfaec78779d15b11293ff
Gerrit-Change-Number: 8450
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot