You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by mp...@apache.org on 2018/02/13 01:30:40 UTC

[1/3] kudu git commit: logging: fix UBSAN unsigned int overflow in LogThrottler

Repository: kudu
Updated Branches:
  refs/heads/master 84e0f3033 -> 136b8058f


logging: fix UBSAN unsigned int overflow in LogThrottler

Fixes the following UBSAN error:

  src/kudu/util/logging.h:333:12: runtime error: unsigned integer
  overflow: 563980051 - 563991872 cannot be represented in type 'unsigned
  long'

This was happening because we used an unsigned int64 when subtracing
timestamps, and because this function is intentionally racy, it was
possible to underflow and end up negative.

No real functional issue fixed here since the worst that would happen
was an extra (non-throttled) log message when the race triggered.

Change-Id: Ib2078b5f49dc3c751b4bb7db893506494c758289
Reviewed-on: http://gerrit.cloudera.org:8080/9289
Reviewed-by: Dan Burkert <da...@cloudera.com>
Tested-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/c9c86f47
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/c9c86f47
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/c9c86f47

Branch: refs/heads/master
Commit: c9c86f4788c42b864bf0e8aaebcd03d75f5a6e9b
Parents: 84e0f30
Author: Todd Lipcon <to...@apache.org>
Authored: Mon Feb 12 15:26:16 2018 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Feb 13 00:12:41 2018 +0000

----------------------------------------------------------------------
 src/kudu/util/logging.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/c9c86f47/src/kudu/util/logging.h
----------------------------------------------------------------------
diff --git a/src/kudu/util/logging.h b/src/kudu/util/logging.h
index bfc4c59..442f94b 100644
--- a/src/kudu/util/logging.h
+++ b/src/kudu/util/logging.h
@@ -340,7 +340,7 @@ class LogThrottler {
   }
  private:
   Atomic32 num_suppressed_;
-  uint64_t last_ts_;
+  MicrosecondsInt64 last_ts_;
   const char* last_tag_;
 };
 } // namespace logging


[2/3] kudu git commit: docs: improvements to NTP troubleshooting

Posted by mp...@apache.org.
docs: improvements to NTP troubleshooting

Change-Id: I07b6871b91ed4ee08992d2fcd093f1054c7d61b8
Reviewed-on: http://gerrit.cloudera.org:8080/9234
Reviewed-by: Will Berkeley <wd...@gmail.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/60eca012
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/60eca012
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/60eca012

Branch: refs/heads/master
Commit: 60eca0125c9383fa67b304b15b64728b8f153ceb
Parents: c9c86f4
Author: Todd Lipcon <to...@apache.org>
Authored: Tue Feb 6 17:07:58 2018 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Feb 13 01:09:08 2018 +0000

----------------------------------------------------------------------
 docs/troubleshooting.adoc | 151 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 143 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/60eca012/docs/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/docs/troubleshooting.adoc b/docs/troubleshooting.adoc
index 34ac291..95557e3 100644
--- a/docs/troubleshooting.adoc
+++ b/docs/troubleshooting.adoc
@@ -94,8 +94,8 @@ or
 Sep 17, 8:32:31.135 PM FATAL tablet_server_main.cc:38 Check failed: _s.ok() Bad status: Service unavailable: Cannot initialize clock: Cannot initialize HybridClock. Clock synchronized but error was too high (11711000 us).
 ----
 
-TIP: If NTP is installed the user can monitor the synchronization status by running
-`ntptime`. The relevant value is what is reported for `maximum error`.
+==== Installing NTP
+
 
 To install NTP, use the appropriate command for your operating system:
 [cols="1,1", options="header"]
@@ -113,14 +113,149 @@ If NTP is installed but not running, start it using one of these commands:
 | RHEL/CentOS | `sudo /etc/init.d/ntpd restart`
 |===
 
-TIP: NTP requires a network connection and may take a few minutes to synchronize the clock.
-In some cases a spotty network connection may make NTP report the clock as unsynchronized.
+====  Monitoring NTP Status
+
+When NTP is installed, you can monitor the synchronization status by running
+`ntptime`. For example, a healthy system may report:
+
+----
+ntp_gettime() returns code 0 (OK)
+  time de24c0cf.8d5da274  Tue, Feb  6 2018 16:03:27.552, (.552210980),
+  maximum error 224455 us, estimated error 383 us, TAI offset 0
+ntp_adjtime() returns code 0 (OK)
+  modes 0x0 (),
+  offset 1279.543 us, frequency 2.500 ppm, interval 1 s,
+  maximum error 224455 us, estimated error 383 us,
+  status 0x2001 (PLL,NANO),
+  time constant 10, precision 0.001 us, tolerance 500 ppm,
+----
+
+In particular, note the following most important pieces of output:
+
+- `maximum error 22455 us`: this value is well under the 10-second maximum error required
+  by Kudu.
+- `status 0x2001 (PLL,NANO)`: this indicates a healthy synchronization status.
+
+In contrast, a system without NTP properly configured and running will output
+something like the following:
+
+----
+ntp_gettime() returns code 5 (ERROR)
+  time de24c240.0c006000  Tue, Feb  6 2018 16:09:36.046, (.046881),
+  maximum error 16000000 us, estimated error 16000000 us, TAI offset 0
+ntp_adjtime() returns code 5 (ERROR)
+  modes 0x0 (),
+  offset 0.000 us, frequency 2.500 ppm, interval 1 s,
+  maximum error 16000000 us, estimated error 16000000 us,
+  status 0x40 (UNSYNC),
+  time constant 10, precision 1.000 us, tolerance 500 ppm,
+----
+
+Note the `UNSYNC` status and the 16-second maximum error.
+
+If more detailed information is needed, the `ntpq` or `ntpdc` tools
+can be used to dump further information about which network time servers
+are currently acting as sources:
+
+----
+$ ntpq -n -c opeers
+     remote           local      st t when poll reach   delay   offset    disp
+==============================================================================
+ 0.0.0.0         0.0.0.0         16 p    -   64    0    0.000    0.000 16000.0
+ 0.0.0.0         0.0.0.0         16 p    -   64    0    0.000    0.000 16000.0
+ 0.0.0.0         0.0.0.0         16 p    -   64    0    0.000    0.000 16000.0
+ 0.0.0.0         0.0.0.0         16 p    -   64    0    0.000    0.000 16000.0
+ 0.0.0.0         0.0.0.0         16 p    -   64    0    0.000    0.000 16000.0
+-108.59.2.24     10.16.2.89       2 u    3   64    3   74.380    0.321  62.992
+-208.82.104.205  10.16.2.89       2 u    5   64    3   52.654   -4.054  62.965
+#192.96.202.120  10.16.2.89       2 u    1   64    3   74.737    6.538  62.988
+#69.10.161.7     10.16.2.89       3 u    5   64    3   28.353   -1.967  62.960
+-173.255.206.154 10.16.2.89       3 u    -   64    3   42.906   -3.127  62.996
+-69.195.159.158  10.16.2.89       2 u    1   64    3   52.543   -4.788  62.987
+*216.218.254.202 10.16.2.89       1 u    5   64    3    2.567    0.053  62.974
+-129.250.35.250  10.16.2.89       2 u    3   64    3    2.603    0.256  62.985
++45.76.244.193   10.16.2.89       2 u    5   64    3   19.522    0.188  62.969
+-69.89.207.199   10.16.2.89       2 u    5   64    3   66.687   -0.395  62.967
+-171.66.97.126   10.16.2.89       1 u    1   64    3   12.627   -3.572  62.963
+#66.228.42.59    10.16.2.89       4 u    1   64    3   72.143    4.034  62.971
+ 91.189.89.198   10.16.2.89       2 u    5   64    3  135.329    3.069 3937.74
+#162.210.111.4   10.16.2.89       2 u    -   64    3   29.572    6.849  62.966
++199.102.46.80   10.16.2.89       1 u    3   64    3   57.022    0.111  63.386
+ 91.189.89.199   10.16.2.89       2 u    4   64    3  138.269    3.228 3937.98
+----
+
+TIP: Depending on the specific version of NTP, the correct command may be either
+`ntpq -n -c opeers` or `ntpq -n -c lpeers`.
+
+
+[NOTE]
+****
+.Using `chrony` for time synchronization
+
+Some operating systems offer `chrony` as an alternative to `ntpd` for network time
+synchronization. Kudu has been tested most thoroughly using `ntpd` and use of
+`chrony` is considered experimental.
+
+In order to use `chrony` for synchronization, `chrony.conf` must be configured
+with the `rtcsync` option.
+****
+
+==== NTP Configuration Best Practices
+
+In order to provide stable time synchronization with low maximum error, follow
+these best NTP configuration best practices.
+
+*Always configure at least four time sources for NTP.* In addition to providing
+redundancy in case one or more time sources becomes unavailable, The NTP protocol is
+designed to increase its accuracy with a diversity of sources. Even if your organization
+provides one or more local time servers, configuring additional remote servers is highly
+recommended for a robust setup.
+
+*Pick servers in your server's local geography.* For example, if your servers are located
+in Europe, pick servers from the European NTP pool. If your servers are running in a public
+cloud environment, consult the cloud provider's documentation for a recommended NTP setup.
+Many cloud providers offer highly accurate clock synchronization as a service.
+
+*Use the `iburst` option for faster synchronization at startup*. The `iburst` option
+instructs `ntpd` to send an initial "burst" of time queries at startup. This typically
+results in a faster time synchronization when a machine restarts.
+
+An example NTP server list may appear as follows:
+
+----
+# Use my organization's internal NTP servers.
+server ntp1.myorg.internal iburst
+server ntp2.myorg.internal iburst
+# Provide several public pool servers from the US pool for
+# redundancy and robustness.
+server 0.pool.us.ntp.org iburst
+server 1.pool.us.ntp.org iburst
+server 2.pool.us.ntp.org iburst
+server 3.pool.us.ntp.org iburst
+----
+
+TIP: After configuring NTP, use the `ntpq` tool described above to verify that `ntpd` was
+able to connect to a variety of peers. If no public peers appear, it is possiblbe that
+the NTP protocol is being blocked by a firewall or other network connectivity issue.
+
+==== Troubleshooting NTP Stability Problems
+
+As of Kudu 1.6.0, Kudu daemons are able to continue to operate during a brief loss of
+NTP synchronization. If NTP synchronization is lost for several hours, however, daemons
+may crash. If a daemon crashes due to NTP synchronization issues, consult the `ERROR` log
+for a dump of related information which may help to diagnose the issue.
+
+TIP: Kudu 1.5.0 and earlier versions were less resilient to brief NTP outages. In
+addition, they contained a link:https://issues.apache.org/jira/browse/KUDU-2209[bug]
+which could cause Kudu to incorrectly measure the maximum error, resulting in
+crashes. If you experience crashes related to clock synchronization on these
+earlier versions of Kudu and it appears that the system's NTP configuration is correct,
+consider upgrading to Kudu 1.6.0 or later.
+
+TIP: NTP requires a network connection and may take a few minutes to synchronize the clock
+at startup. In some cases a spotty network connection may make NTP report the clock as unsynchronized.
 A common, though temporary, workaround for this is to restart NTP with one of the commands above.
 
-If the clock is being reported as synchronized by NTP, but the maximum error is too high,
-the user can increase the threshold to a higher value by setting the above
-mentioned flag. For example to increase the possible maximum error to
-20 seconds the flag should be set like: `--max_clock_sync_error_usec=20000000`
 
 [[crash_reporting]]
 == Reporting Kudu Crashes


[3/3] kudu git commit: docs: Update release management documentation

Posted by mp...@apache.org.
docs: Update release management documentation

Change-Id: I43575df56bb36e49a06feffe6efac96a52347c24
Reviewed-on: http://gerrit.cloudera.org:8080/8744
Reviewed-by: Dan Burkert <da...@cloudera.com>
Tested-by: Dan Burkert <da...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/136b8058
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/136b8058
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/136b8058

Branch: refs/heads/master
Commit: 136b8058fb1b7206216c1962b6b1f8a6927b8e3b
Parents: 60eca01
Author: Mike Percy <mp...@apache.org>
Authored: Fri Dec 1 23:42:18 2017 -0800
Committer: Mike Percy <mp...@apache.org>
Committed: Tue Feb 13 01:27:27 2018 +0000

----------------------------------------------------------------------
 RELEASING.adoc | 89 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 74 insertions(+), 15 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/136b8058/RELEASING.adoc
----------------------------------------------------------------------
diff --git a/RELEASING.adoc b/RELEASING.adoc
index ebc0039..3646069 100644
--- a/RELEASING.adoc
+++ b/RELEASING.adoc
@@ -39,7 +39,7 @@ in `master`.
 ----
   git checkout master
   git pull
-  git checkout -b branch-0.9.x
+  git checkout -b branch-1.x.y
 ----
 
 . Make a note of the SHA1 for the tip of the new branch, which is the first
@@ -54,15 +54,28 @@ http://git-wip-us.apache.org/repos/asf?p=kudu.git. The following example
 assumes they are called `cloudera` and `apache`.
 +
 ----
-  git push cloudera branch-0.9.x
-  git push apache branch-0.9.x
+  git push cloudera branch-1.x.y
+  git push apache branch-1.x.y
 ----
 
 . Create a new branch on Gerrit. Go to
 http://gerrit.cloudera.org:8080/#/admin/projects/kudu,branches and create a new
 branch with the same name and the previously-noted SHA1.
 
-. Notify Todd to fix the mirroring. He will know what that means.
+. Ask someone with permissions to fix the gerrit.cloudera.org mirroring
+  configuration. Cloudera hosts the Gerrit server and a Cloudera employee will
+  have to perform this step because SSH access is behind a firewall. The steps
+  are as follows:
+  1. Ensure your public SSH key is in `~gerrit/.ssh/authorized_keys` on gerrit.cloudera.org
+  2. From behind the firewall, `ssh gerrit@gerrit.cloudera.org` to log in.
+  3. Back up the existing replication configuration file by executing
+     `cp etc/replication.config etc/replication.config.bak.\`date '+%Y%m%d.%H%M%S'\``
+  4. Edit `etc/replication.config` to add a line for the new branch, such as `branch-1.x.y`
+  5. Send email to the dev lists for Kudu and Impala (dev@kudu.apache.org and
+     dev@impala.apache.org) indicating that you are going to restart Gerrit
+     (link:https://s.apache.org/2Wj7[example]). It is best to do the restart at
+     some time of day when you don't expect many people to be using the system,
+     since Gerrit can take a few minutes to restart.
 
 . As needed, patches can be cherry-picked to the new branch.
 
@@ -74,7 +87,7 @@ branch with the same name and the previously-noted SHA1.
 +
 ----
   cd java
-  mvn versions:set -DnewVersion=0.X.0-SNAPSHOT
+  mvn versions:set -DnewVersion=1.x.y-SNAPSHOT
 ----
 
 . Update the version in `java/gradle.properties`.
@@ -98,6 +111,43 @@ branch with the same name and the previously-noted SHA1.
 
 . Fix any issues it finds, such as RAT.
 
+. Add the following information to your `~/.m2/settings.xml` file in order to
+  be able to deploy artifacts to the ASF Maven repository:
++
+----
+<settings>
+  <servers>
+    <server>
+      <id>apache.snapshots.https</id>
+      <username> <!-- YOUR APACHE LDAP USERNAME --> </username>
+      <password> <!-- YOUR APACHE LDAP PASSWORD (encrypted) --> </password>
+    </server>
+    <!-- To stage a release of some part of Maven -->
+    <server>
+      <id>apache.releases.https</id>
+      <username> <!-- YOUR APACHE LDAP USERNAME --> </username>
+      <password> <!-- YOUR APACHE LDAP PASSWORD (encrypted) --> </password>
+    </server>
+  </servers>
+</settings>
+----
++
+If you don't want to keep your ASF password in plaintext on your local machine,
+you can link:http://maven.apache.org/guides/mini/guide-encryption.html[encrypt it].
+
+. Test the full Java build. This will sign and build everything without
+  deploying any artifacts:
++
+----
+  # Run a gpg-agent if you don't normally. You may have to tweak it to get it
+  # to work with Maven, and this StackOverflow article might help:
+  # https://stackoverflow.com/questions/36506275/why-do-i-have-to-kill-gpg-agent-to-sign-my-commits
+  gpg-agent --daemon
+  cd java
+  mvn -DskipTests -Papache-release clean install
+----
++
+
 . Create a new version update commit which removes the -SNAPSHOT suffix (same
   process as above).
 
@@ -124,10 +174,10 @@ branch with the same name and the previously-noted SHA1.
   # Run a gpg-agent if you don't normally
   gpg-agent --daemon
   cd java
-  mvn -DskipTests clean -Papache-release clean deploy
+  mvn -DskipTests -Papache-release clean deploy
 ----
 +
-Go to the link:https://repository.apache.org/#stagingRepositories[staging repository]
+Go to the link:https://repository.apache.org/\#stagingRepositories[staging repository]
 and look for ‘orgapachekudu-####’ in the staging repositories list. You can
 check the ‘content’ tab at the bottom to make sure you have all of the expected
 stuff (client, various integrations, both versions of Spark) Hit the checkbox
@@ -136,6 +186,15 @@ whatever into that box. Wait a minute or two and hit refresh, and your staging
 repo should now have a URL shown in its summary tab (eg
 `https://repository.apache.org/content/repositories/orgapachekudu-1005`)
 
+. Add your PGP key to the KEYS file:
++
+----
+svn co https://dist.apache.org/repos/dist/release/kudu/ kudu-dist-release
+cd kudu-dist-release
+(gpg --list-sigs <your-email-address> && gpg --armor --export <your-email-address>) >> KEYS
+svn commit -m "Adding my key to the KEYS file"
+----
+
 == Initiating a Vote for an RC
 
 . Send an email to dev@kudu.apache.org to start the RC process, using
@@ -160,18 +219,18 @@ repo should now have a URL shown in its summary tab (eg
 +
 ----
   cd kudu
-  mkdir 0.9.2
-  cp <path_to_rc_artifacts>/* 0.9.2
-  svn add 0.9.2
-  svn commit -m "Adding files for Kudu 0.9.2 RC"
+  mkdir 1.x.y
+  cp <path_to_rc_artifacts>/* 1.x.y
+  svn add 1.x.y
+  svn commit -m "Adding files for Kudu 1.x.y RC"
 ----
 
 . In the Kudu git repo, create a signed tag from the RC’s tag, and push it to the
 Apache Git repository:
 +
 ----
-  git tag -s 0.9.2 -m 'Release Apache Kudu 0.9.2' 0.9.2-RC1
-  git push apache 0.9.2-RC1
+  git tag -s 1.x.y -m 'Release Apache Kudu 1.x.y' 1.x.y-RC1
+  git push apache 1.x.y-RC1
 ----
 
 . Release the staged Java artifacts. Select the release candidate staging
@@ -196,10 +255,10 @@ Apache Git repository:
 . About 24 hours after the first step was completed, send an email to
   user@kudu.apache.org, dev@kudu.apache.org, and announce@apache.org
   to announce the release. The email should be similar to
-  link:http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3CCAGpTDNeHW53US=qdpQPCQk0WaFBxx_KNx1E9b6NBBnbWpkSpmQ@mail.gmail.com%3E[this].
+  link:https://s.apache.org/pduz[this].
 
 . About another 24 hours later, delete the previous minor version in the branch
   you released from, from SVN. For example, if you released 1.2.1, delete `1.2.0`.
 
 . Update the version number on the branch you released from back to a SNAPSHOT
-  for the next patch release, such as `0.9.2-SNAPSHOT` after the `0.9.1` release.
+  for the next patch release, such as `1.6.1-SNAPSHOT` after the `1.6.0` release.