You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/02/23 04:00:19 UTC

[kudu-CR] [clock] relax version constraint for NTP response

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15274


Change subject: [clock] relax version constraint for NTP response
......................................................................

[clock] relax version constraint for NTP response

As it turned out, there are NTP servers in the wild that respond back
with version 4 even if they support version 3 and the client sent
version 3 in its request packet.  In this particular case, it was the
dedicated server in one of r5ad.xlarge EC2 instances provisioned in
EC2 us-west-2 (interestingly enough, the dedicated NTP servers of
other EC2 t2.large instances in my test cluster behaved as expected).
The observed behavior of the dedicated NTP server at the mentioned
r5ad.xlarge instance appears to contradict the information presented
in RFC 5905 (appendix A.5.3. fast_xmit()) and RFC 4330 (Secion 5.
SNTP Client Operations), but it is what it is.

Because the former strict constraint in the built-in NTP client,
it was not able to work with such NTP servers.

This patch relaxes the requirement for expected NTP versions sent in
a server's response.  For details, see the corresponding comment
in builtin_ntp.cc.

As for testing and verification, I ran the updated code and verified
that the built-in NTP client is now able to work with the NTP server
mentioned above, tracking true time as expected.  As a side note,
I also explored options to implement a test scenario using MiniChronyd,
but I found no way to configure chronyd to behave in a non-conforming
manner described above.

Change-Id: I1df3549616bbced696380458978c33264ed636c9
---
M src/kudu/clock/builtin_ntp.cc
1 file changed, 17 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/15274/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1df3549616bbced696380458978c33264ed636c9
Gerrit-Change-Number: 15274
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [clock] relax version constraint for NTP response

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15274 )

Change subject: [clock] relax version constraint for NTP response
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@13
PS1, Line 13: interestingly enough, the dedicated NTP servers of
            : other EC2 t2.large instances in my test cluster behaved as expected
I'd expect the NTP server behavior to be a function of the NTP server software running in the instance. Was there a difference in the r5ad.xlarge and t2.large instances you tested with?


http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@17
PS1, Line 17: Secion
Section


http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@20
PS1, Line 20: the
of the


http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@29
PS1, Line 29: As a side note,
            : I also explored options to implement a test scenario using MiniChronyd,
            : but I found no way to configure chronyd to behave in a non-conforming
            : manner described above.
Seems like it'd be good to have the ability to create a "mock" chronyd that we can use to send arbitrary packets to the built-in NTP client. Or, alternatively, an interceptor that can sit between a real chronyd and the built-in NTP client, and modify packets to contrive test scenarios.

My only issue with this patch is that it has no regression test coverage; if someone accidentally reverts your patch down the line, we won't notice. Is there a minimal amount of "mocking" we could use to test this scenario that won't take a long time to implement?


http://gerrit.cloudera.org:8080/#/c/15274/1/src/kudu/clock/builtin_ntp.cc
File src/kudu/clock/builtin_ntp.cc:

http://gerrit.cloudera.org:8080/#/c/15274/1/src/kudu/clock/builtin_ntp.cc@732
PS1, Line 732: Secion
Section



-- 
To view, visit http://gerrit.cloudera.org:8080/15274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1df3549616bbced696380458978c33264ed636c9
Gerrit-Change-Number: 15274
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 23 Feb 2020 22:10:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] [clock] relax version constraint for NTP response

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15274 )

Change subject: [clock] relax version constraint for NTP response
......................................................................

[clock] relax version constraint for NTP response

As it turned out, there are NTP servers in the wild that respond back
with version 4 even if they support version 3 and the client sent
version 3 in its request packet.  In this particular case, it was the
dedicated server in one of r5ad.xlarge EC2 instances provisioned in EC2
us-west-2 (interestingly enough, the dedicated NTP servers of other
instances of EC2 t2.large type in my test cluster behaved as expected).
The observed behavior of the dedicated NTP server at the mentioned
r5ad.xlarge instance appears to contradict the information presented
in RFC 5905 (appendix A.5.3. fast_xmit()) and RFC 4330 (Section 5.
SNTP Client Operations), but it is what it is.

Because of the former strict constraint in the built-in NTP client,
it was not able to work with such NTP servers.

This patch relaxes the requirement for expected NTP versions sent in
a server's response.  For details, see the corresponding comment
in builtin_ntp.cc.

As for testing and verification, I ran the updated code and verified
that the built-in NTP client is now able to work with the NTP server
mentioned above, tracking true time as expected.  As a side note,
I also explored options to implement a test scenario using MiniChronyd,
but I found no way to configure chronyd to behave in a non-conforming
manner described above.

Change-Id: I1df3549616bbced696380458978c33264ed636c9
Reviewed-on: http://gerrit.cloudera.org:8080/15274
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/clock/builtin_ntp.cc
1 file changed, 17 insertions(+), 2 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/15274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1df3549616bbced696380458978c33264ed636c9
Gerrit-Change-Number: 15274
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [clock] relax version constraint for NTP response

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15274 )

Change subject: [clock] relax version constraint for NTP response
......................................................................


Patch Set 1: Verified+1

Unrelated test failure in MasterReplicationAndRpcSizeLimitTest.TabletReports


-- 
To view, visit http://gerrit.cloudera.org:8080/15274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1df3549616bbced696380458978c33264ed636c9
Gerrit-Change-Number: 15274
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 23 Feb 2020 04:27:13 +0000
Gerrit-HasComments: No

[kudu-CR] [clock] relax version constraint for NTP response

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [clock] relax version constraint for NTP response
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/15274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I1df3549616bbced696380458978c33264ed636c9
Gerrit-Change-Number: 15274
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [clock] relax version constraint for NTP response

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15274 )

Change subject: [clock] relax version constraint for NTP response
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@13
PS1, Line 13: interestingly enough, the dedicated NTP servers of
            : other EC2 t2.large instances in my test cluster behaved as expected
> Oh, this is a change in behavior of the _AWS NTP server_. I thought you wer
Yep, that was about AWS NTP servers.  I was playing with the 'auto' time source and hit the issue described in the commit message.

Thank you for the review!



-- 
To view, visit http://gerrit.cloudera.org:8080/15274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1df3549616bbced696380458978c33264ed636c9
Gerrit-Change-Number: 15274
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 24 Feb 2020 21:16:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] [clock] relax version constraint for NTP response

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15274

to look at the new patch set (#2).

Change subject: [clock] relax version constraint for NTP response
......................................................................

[clock] relax version constraint for NTP response

As it turned out, there are NTP servers in the wild that respond back
with version 4 even if they support version 3 and the client sent
version 3 in its request packet.  In this particular case, it was the
dedicated server in one of r5ad.xlarge EC2 instances provisioned in EC2
us-west-2 (interestingly enough, the dedicated NTP servers of other
instances of EC2 t2.large type in my test cluster behaved as expected).
The observed behavior of the dedicated NTP server at the mentioned
r5ad.xlarge instance appears to contradict the information presented
in RFC 5905 (appendix A.5.3. fast_xmit()) and RFC 4330 (Section 5.
SNTP Client Operations), but it is what it is.

Because of the former strict constraint in the built-in NTP client,
it was not able to work with such NTP servers.

This patch relaxes the requirement for expected NTP versions sent in
a server's response.  For details, see the corresponding comment
in builtin_ntp.cc.

As for testing and verification, I ran the updated code and verified
that the built-in NTP client is now able to work with the NTP server
mentioned above, tracking true time as expected.  As a side note,
I also explored options to implement a test scenario using MiniChronyd,
but I found no way to configure chronyd to behave in a non-conforming
manner described above.

Change-Id: I1df3549616bbced696380458978c33264ed636c9
---
M src/kudu/clock/builtin_ntp.cc
1 file changed, 17 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/15274/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1df3549616bbced696380458978c33264ed636c9
Gerrit-Change-Number: 15274
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [clock] relax version constraint for NTP response

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15274 )

Change subject: [clock] relax version constraint for NTP response
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@13
PS1, Line 13: interestingly enough, the dedicated NTP servers of
            : other EC2 t2.large instances in my test cluster behaved as expected
> I'd expect the NTP server behavior to be a function of the NTP server softw
Nope, there was not a difference: all are ami-0198ddd64383b5bb0 (AMI named 'cloudera-systest-base-centos-7.5').  The only difference is the type of EC2 instances: in my test cluster, I have four t2.large instances and one r5ad.xlarge instance.  All t2.large instances show the same conformant behavior, and only the r5ad.xlarge instance behaves in a different way.  All NTP servers report they are of version 4, but there is a difference in behavior.  In addition, all t2.large instances report precision of 0.000000015 seconds while r5ad.xlarge report precision of 0.000003815 seconds.  Also, TX timestamping is different.  In other words, apparently those are different NTP server implementations.
 

`chronyc ntpdata 169.254.169.123` for t2.large:
Remote address  : 169.254.169.123 (A9FEA97B)
Remote port     : 123
Local address   : 10.65.11.29 (0A410B1D)
Leap status     : Normal
Version         : 4
Mode            : Server
Stratum         : 3
Poll interval   : 8 (256 seconds)
Precision       : -26 (0.000000015 seconds)
Root delay      : 0.000259 seconds
Root dispersion : 0.000275 seconds
Reference ID    : 0A81F8C4 ()
Reference time  : Mon Feb 24 06:09:52 2020
Offset          : +0.000034901 seconds
Peer delay      : 0.000251005 seconds
Peer dispersion : 0.000000373 seconds
Response time   : 0.000113064 seconds
Jitter asymmetry: +0.00
NTP tests       : 111 111 1111
Interleaved     : No
Authenticated   : No
TX timestamping : Daemon
RX timestamping : Kernel
Total TX        : 902
Total RX        : 902
Total valid RX  : 902



`chronyc ntpdata 169.254.169.123` for r5ad.xlarge:
Remote address  : 169.254.169.123 (A9FEA97B)
Remote port     : 123
Local address   : 10.65.14.145 (0A410E91)
Leap status     : Normal
Version         : 4
Mode            : Server
Stratum         : 3
Poll interval   : 6 (64 seconds)
Precision       : -18 (0.000003815 seconds)
Root delay      : 0.000183 seconds
Root dispersion : 0.000259 seconds
Reference ID    : A9FEA97A ()
Reference time  : Mon Feb 24 06:12:20 2020
Offset          : +0.000004004 seconds
Peer delay      : 0.000117991 seconds
Peer dispersion : 0.000003855 seconds
Response time   : 0.000014430 seconds
Jitter asymmetry: -0.39
NTP tests       : 111 111 1111
Interleaved     : No
Authenticated   : No
TX timestamping : Kernel
RX timestamping : Kernel
Total TX        : 1262
Total RX        : 1262
Total valid RX  : 1262


http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@17
PS1, Line 17: Secion
> Section
Done


http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@20
PS1, Line 20: the
> of the
Done


http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@29
PS1, Line 29: As a side note,
            : I also explored options to implement a test scenario using MiniChronyd,
            : but I found no way to configure chronyd to behave in a non-conforming
            : manner described above.
> Seems like it'd be good to have the ability to create a "mock" chronyd that
Sure, that's my concern as well.  I'm planning to take a closer look at the idea of a mock NTP server this week.


http://gerrit.cloudera.org:8080/#/c/15274/1/src/kudu/clock/builtin_ntp.cc
File src/kudu/clock/builtin_ntp.cc:

http://gerrit.cloudera.org:8080/#/c/15274/1/src/kudu/clock/builtin_ntp.cc@732
PS1, Line 732: Secion
> Section
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1df3549616bbced696380458978c33264ed636c9
Gerrit-Change-Number: 15274
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 24 Feb 2020 17:16:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [clock] relax version constraint for NTP response

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15274 )

Change subject: [clock] relax version constraint for NTP response
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15274/1//COMMIT_MSG@13
PS1, Line 13: restingly enough, the dedicated NTP servers of other
            : instances of EC2 t2.large type in my test cluster behaved as expect
> Nope, there was not a difference: all are ami-0198ddd64383b5bb0 (AMI named 
Oh, this is a change in behavior of the _AWS NTP server_. I thought you were referring to a change in chronyd running inside the VM.

It makes sense that Amazon could have altered the behavior for instances of different classes/generations. I don't know why they'd do that, but at least I can see how it'd be possible.



-- 
To view, visit http://gerrit.cloudera.org:8080/15274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1df3549616bbced696380458978c33264ed636c9
Gerrit-Change-Number: 15274
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 24 Feb 2020 20:55:23 +0000
Gerrit-HasComments: Yes