You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by mp...@apache.org on 2018/02/13 01:30:40 UTC
[1/3] kudu git commit: logging: fix UBSAN unsigned int overflow in
LogThrottler
Repository: kudu
Updated Branches:
refs/heads/master 84e0f3033 -> 136b8058f
logging: fix UBSAN unsigned int overflow in LogThrottler
Fixes the following UBSAN error:
src/kudu/util/logging.h:333:12: runtime error: unsigned integer
overflow: 563980051 - 563991872 cannot be represented in type 'unsigned
long'
This was happening because we used an unsigned int64 when subtracing
timestamps, and because this function is intentionally racy, it was
possible to underflow and end up negative.
No real functional issue fixed here since the worst that would happen
was an extra (non-throttled) log message when the race triggered.
Change-Id: Ib2078b5f49dc3c751b4bb7db893506494c758289
Reviewed-on: http://gerrit.cloudera.org:8080/9289
Reviewed-by: Dan Burkert <da...@cloudera.com>
Tested-by: Todd Lipcon <to...@apache.org>
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/c9c86f47
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/c9c86f47
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/c9c86f47
Branch: refs/heads/master
Commit: c9c86f4788c42b864bf0e8aaebcd03d75f5a6e9b
Parents: 84e0f30
Author: Todd Lipcon <to...@apache.org>
Authored: Mon Feb 12 15:26:16 2018 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Feb 13 00:12:41 2018 +0000
----------------------------------------------------------------------
src/kudu/util/logging.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/kudu/blob/c9c86f47/src/kudu/util/logging.h
----------------------------------------------------------------------
diff --git a/src/kudu/util/logging.h b/src/kudu/util/logging.h
index bfc4c59..442f94b 100644
--- a/src/kudu/util/logging.h
+++ b/src/kudu/util/logging.h
@@ -340,7 +340,7 @@ class LogThrottler {
}
private:
Atomic32 num_suppressed_;
- uint64_t last_ts_;
+ MicrosecondsInt64 last_ts_;
const char* last_tag_;
};
} // namespace logging
[2/3] kudu git commit: docs: improvements to NTP troubleshooting
Posted by mp...@apache.org.
docs: improvements to NTP troubleshooting
Change-Id: I07b6871b91ed4ee08992d2fcd093f1054c7d61b8
Reviewed-on: http://gerrit.cloudera.org:8080/9234
Reviewed-by: Will Berkeley <wd...@gmail.com>
Tested-by: Kudu Jenkins
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/60eca012
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/60eca012
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/60eca012
Branch: refs/heads/master
Commit: 60eca0125c9383fa67b304b15b64728b8f153ceb
Parents: c9c86f4
Author: Todd Lipcon <to...@apache.org>
Authored: Tue Feb 6 17:07:58 2018 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Feb 13 01:09:08 2018 +0000
----------------------------------------------------------------------
docs/troubleshooting.adoc | 151 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 143 insertions(+), 8 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/kudu/blob/60eca012/docs/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/docs/troubleshooting.adoc b/docs/troubleshooting.adoc
index 34ac291..95557e3 100644
--- a/docs/troubleshooting.adoc
+++ b/docs/troubleshooting.adoc
@@ -94,8 +94,8 @@ or
Sep 17, 8:32:31.135 PM FATAL tablet_server_main.cc:38 Check failed: _s.ok() Bad status: Service unavailable: Cannot initialize clock: Cannot initialize HybridClock. Clock synchronized but error was too high (11711000 us).
----
-TIP: If NTP is installed the user can monitor the synchronization status by running
-`ntptime`. The relevant value is what is reported for `maximum error`.
+==== Installing NTP
+
To install NTP, use the appropriate command for your operating system:
[cols="1,1", options="header"]
@@ -113,14 +113,149 @@ If NTP is installed but not running, start it using one of these commands:
| RHEL/CentOS | `sudo /etc/init.d/ntpd restart`
|===
-TIP: NTP requires a network connection and may take a few minutes to synchronize the clock.
-In some cases a spotty network connection may make NTP report the clock as unsynchronized.
+==== Monitoring NTP Status
+
+When NTP is installed, you can monitor the synchronization status by running
+`ntptime`. For example, a healthy system may report:
+
+----
+ntp_gettime() returns code 0 (OK)
+ time de24c0cf.8d5da274 Tue, Feb 6 2018 16:03:27.552, (.552210980),
+ maximum error 224455 us, estimated error 383 us, TAI offset 0
+ntp_adjtime() returns code 0 (OK)
+ modes 0x0 (),
+ offset 1279.543 us, frequency 2.500 ppm, interval 1 s,
+ maximum error 224455 us, estimated error 383 us,
+ status 0x2001 (PLL,NANO),
+ time constant 10, precision 0.001 us, tolerance 500 ppm,
+----
+
+In particular, note the following most important pieces of output:
+
+- `maximum error 22455 us`: this value is well under the 10-second maximum error required
+ by Kudu.
+- `status 0x2001 (PLL,NANO)`: this indicates a healthy synchronization status.
+
+In contrast, a system without NTP properly configured and running will output
+something like the following:
+
+----
+ntp_gettime() returns code 5 (ERROR)
+ time de24c240.0c006000 Tue, Feb 6 2018 16:09:36.046, (.046881),
+ maximum error 16000000 us, estimated error 16000000 us, TAI offset 0
+ntp_adjtime() returns code 5 (ERROR)
+ modes 0x0 (),
+ offset 0.000 us, frequency 2.500 ppm, interval 1 s,
+ maximum error 16000000 us, estimated error 16000000 us,
+ status 0x40 (UNSYNC),
+ time constant 10, precision 1.000 us, tolerance 500 ppm,
+----
+
+Note the `UNSYNC` status and the 16-second maximum error.
+
+If more detailed information is needed, the `ntpq` or `ntpdc` tools
+can be used to dump further information about which network time servers
+are currently acting as sources:
+
+----
+$ ntpq -n -c opeers
+ remote local st t when poll reach delay offset disp
+==============================================================================
+ 0.0.0.0 0.0.0.0 16 p - 64 0 0.000 0.000 16000.0
+ 0.0.0.0 0.0.0.0 16 p - 64 0 0.000 0.000 16000.0
+ 0.0.0.0 0.0.0.0 16 p - 64 0 0.000 0.000 16000.0
+ 0.0.0.0 0.0.0.0 16 p - 64 0 0.000 0.000 16000.0
+ 0.0.0.0 0.0.0.0 16 p - 64 0 0.000 0.000 16000.0
+-108.59.2.24 10.16.2.89 2 u 3 64 3 74.380 0.321 62.992
+-208.82.104.205 10.16.2.89 2 u 5 64 3 52.654 -4.054 62.965
+#192.96.202.120 10.16.2.89 2 u 1 64 3 74.737 6.538 62.988
+#69.10.161.7 10.16.2.89 3 u 5 64 3 28.353 -1.967 62.960
+-173.255.206.154 10.16.2.89 3 u - 64 3 42.906 -3.127 62.996
+-69.195.159.158 10.16.2.89 2 u 1 64 3 52.543 -4.788 62.987
+*216.218.254.202 10.16.2.89 1 u 5 64 3 2.567 0.053 62.974
+-129.250.35.250 10.16.2.89 2 u 3 64 3 2.603 0.256 62.985
++45.76.244.193 10.16.2.89 2 u 5 64 3 19.522 0.188 62.969
+-69.89.207.199 10.16.2.89 2 u 5 64 3 66.687 -0.395 62.967
+-171.66.97.126 10.16.2.89 1 u 1 64 3 12.627 -3.572 62.963
+#66.228.42.59 10.16.2.89 4 u 1 64 3 72.143 4.034 62.971
+ 91.189.89.198 10.16.2.89 2 u 5 64 3 135.329 3.069 3937.74
+#162.210.111.4 10.16.2.89 2 u - 64 3 29.572 6.849 62.966
++199.102.46.80 10.16.2.89 1 u 3 64 3 57.022 0.111 63.386
+ 91.189.89.199 10.16.2.89 2 u 4 64 3 138.269 3.228 3937.98
+----
+
+TIP: Depending on the specific version of NTP, the correct command may be either
+`ntpq -n -c opeers` or `ntpq -n -c lpeers`.
+
+
+[NOTE]
+****
+.Using `chrony` for time synchronization
+
+Some operating systems offer `chrony` as an alternative to `ntpd` for network time
+synchronization. Kudu has been tested most thoroughly using `ntpd` and use of
+`chrony` is considered experimental.
+
+In order to use `chrony` for synchronization, `chrony.conf` must be configured
+with the `rtcsync` option.
+****
+
+==== NTP Configuration Best Practices
+
+In order to provide stable time synchronization with low maximum error, follow
+these best NTP configuration best practices.
+
+*Always configure at least four time sources for NTP.* In addition to providing
+redundancy in case one or more time sources becomes unavailable, The NTP protocol is
+designed to increase its accuracy with a diversity of sources. Even if your organization
+provides one or more local time servers, configuring additional remote servers is highly
+recommended for a robust setup.
+
+*Pick servers in your server's local geography.* For example, if your servers are located
+in Europe, pick servers from the European NTP pool. If your servers are running in a public
+cloud environment, consult the cloud provider's documentation for a recommended NTP setup.
+Many cloud providers offer highly accurate clock synchronization as a service.
+
+*Use the `iburst` option for faster synchronization at startup*. The `iburst` option
+instructs `ntpd` to send an initial "burst" of time queries at startup. This typically
+results in a faster time synchronization when a machine restarts.
+
+An example NTP server list may appear as follows:
+
+----
+# Use my organization's internal NTP servers.
+server ntp1.myorg.internal iburst
+server ntp2.myorg.internal iburst
+# Provide several public pool servers from the US pool for
+# redundancy and robustness.
+server 0.pool.us.ntp.org iburst
+server 1.pool.us.ntp.org iburst
+server 2.pool.us.ntp.org iburst
+server 3.pool.us.ntp.org iburst
+----
+
+TIP: After configuring NTP, use the `ntpq` tool described above to verify that `ntpd` was
+able to connect to a variety of peers. If no public peers appear, it is possiblbe that
+the NTP protocol is being blocked by a firewall or other network connectivity issue.
+
+==== Troubleshooting NTP Stability Problems
+
+As of Kudu 1.6.0, Kudu daemons are able to continue to operate during a brief loss of
+NTP synchronization. If NTP synchronization is lost for several hours, however, daemons
+may crash. If a daemon crashes due to NTP synchronization issues, consult the `ERROR` log
+for a dump of related information which may help to diagnose the issue.
+
+TIP: Kudu 1.5.0 and earlier versions were less resilient to brief NTP outages. In
+addition, they contained a link:https://issues.apache.org/jira/browse/KUDU-2209[bug]
+which could cause Kudu to incorrectly measure the maximum error, resulting in
+crashes. If you experience crashes related to clock synchronization on these
+earlier versions of Kudu and it appears that the system's NTP configuration is correct,
+consider upgrading to Kudu 1.6.0 or later.
+
+TIP: NTP requires a network connection and may take a few minutes to synchronize the clock
+at startup. In some cases a spotty network connection may make NTP report the clock as unsynchronized.
A common, though temporary, workaround for this is to restart NTP with one of the commands above.
-If the clock is being reported as synchronized by NTP, but the maximum error is too high,
-the user can increase the threshold to a higher value by setting the above
-mentioned flag. For example to increase the possible maximum error to
-20 seconds the flag should be set like: `--max_clock_sync_error_usec=20000000`
[[crash_reporting]]
== Reporting Kudu Crashes
[3/3] kudu git commit: docs: Update release management documentation
Posted by mp...@apache.org.
docs: Update release management documentation
Change-Id: I43575df56bb36e49a06feffe6efac96a52347c24
Reviewed-on: http://gerrit.cloudera.org:8080/8744
Reviewed-by: Dan Burkert <da...@cloudera.com>
Tested-by: Dan Burkert <da...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/136b8058
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/136b8058
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/136b8058
Branch: refs/heads/master
Commit: 136b8058fb1b7206216c1962b6b1f8a6927b8e3b
Parents: 60eca01
Author: Mike Percy <mp...@apache.org>
Authored: Fri Dec 1 23:42:18 2017 -0800
Committer: Mike Percy <mp...@apache.org>
Committed: Tue Feb 13 01:27:27 2018 +0000
----------------------------------------------------------------------
RELEASING.adoc | 89 ++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 74 insertions(+), 15 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/kudu/blob/136b8058/RELEASING.adoc
----------------------------------------------------------------------
diff --git a/RELEASING.adoc b/RELEASING.adoc
index ebc0039..3646069 100644
--- a/RELEASING.adoc
+++ b/RELEASING.adoc
@@ -39,7 +39,7 @@ in `master`.
----
git checkout master
git pull
- git checkout -b branch-0.9.x
+ git checkout -b branch-1.x.y
----
. Make a note of the SHA1 for the tip of the new branch, which is the first
@@ -54,15 +54,28 @@ http://git-wip-us.apache.org/repos/asf?p=kudu.git. The following example
assumes they are called `cloudera` and `apache`.
+
----
- git push cloudera branch-0.9.x
- git push apache branch-0.9.x
+ git push cloudera branch-1.x.y
+ git push apache branch-1.x.y
----
. Create a new branch on Gerrit. Go to
http://gerrit.cloudera.org:8080/#/admin/projects/kudu,branches and create a new
branch with the same name and the previously-noted SHA1.
-. Notify Todd to fix the mirroring. He will know what that means.
+. Ask someone with permissions to fix the gerrit.cloudera.org mirroring
+ configuration. Cloudera hosts the Gerrit server and a Cloudera employee will
+ have to perform this step because SSH access is behind a firewall. The steps
+ are as follows:
+ 1. Ensure your public SSH key is in `~gerrit/.ssh/authorized_keys` on gerrit.cloudera.org
+ 2. From behind the firewall, `ssh gerrit@gerrit.cloudera.org` to log in.
+ 3. Back up the existing replication configuration file by executing
+ `cp etc/replication.config etc/replication.config.bak.\`date '+%Y%m%d.%H%M%S'\``
+ 4. Edit `etc/replication.config` to add a line for the new branch, such as `branch-1.x.y`
+ 5. Send email to the dev lists for Kudu and Impala (dev@kudu.apache.org and
+ dev@impala.apache.org) indicating that you are going to restart Gerrit
+ (link:https://s.apache.org/2Wj7[example]). It is best to do the restart at
+ some time of day when you don't expect many people to be using the system,
+ since Gerrit can take a few minutes to restart.
. As needed, patches can be cherry-picked to the new branch.
@@ -74,7 +87,7 @@ branch with the same name and the previously-noted SHA1.
+
----
cd java
- mvn versions:set -DnewVersion=0.X.0-SNAPSHOT
+ mvn versions:set -DnewVersion=1.x.y-SNAPSHOT
----
. Update the version in `java/gradle.properties`.
@@ -98,6 +111,43 @@ branch with the same name and the previously-noted SHA1.
. Fix any issues it finds, such as RAT.
+. Add the following information to your `~/.m2/settings.xml` file in order to
+ be able to deploy artifacts to the ASF Maven repository:
++
+----
+<settings>
+ <servers>
+ <server>
+ <id>apache.snapshots.https</id>
+ <username> <!-- YOUR APACHE LDAP USERNAME --> </username>
+ <password> <!-- YOUR APACHE LDAP PASSWORD (encrypted) --> </password>
+ </server>
+ <!-- To stage a release of some part of Maven -->
+ <server>
+ <id>apache.releases.https</id>
+ <username> <!-- YOUR APACHE LDAP USERNAME --> </username>
+ <password> <!-- YOUR APACHE LDAP PASSWORD (encrypted) --> </password>
+ </server>
+ </servers>
+</settings>
+----
++
+If you don't want to keep your ASF password in plaintext on your local machine,
+you can link:http://maven.apache.org/guides/mini/guide-encryption.html[encrypt it].
+
+. Test the full Java build. This will sign and build everything without
+ deploying any artifacts:
++
+----
+ # Run a gpg-agent if you don't normally. You may have to tweak it to get it
+ # to work with Maven, and this StackOverflow article might help:
+ # https://stackoverflow.com/questions/36506275/why-do-i-have-to-kill-gpg-agent-to-sign-my-commits
+ gpg-agent --daemon
+ cd java
+ mvn -DskipTests -Papache-release clean install
+----
++
+
. Create a new version update commit which removes the -SNAPSHOT suffix (same
process as above).
@@ -124,10 +174,10 @@ branch with the same name and the previously-noted SHA1.
# Run a gpg-agent if you don't normally
gpg-agent --daemon
cd java
- mvn -DskipTests clean -Papache-release clean deploy
+ mvn -DskipTests -Papache-release clean deploy
----
+
-Go to the link:https://repository.apache.org/#stagingRepositories[staging repository]
+Go to the link:https://repository.apache.org/\#stagingRepositories[staging repository]
and look for ‘orgapachekudu-####’ in the staging repositories list. You can
check the ‘content’ tab at the bottom to make sure you have all of the expected
stuff (client, various integrations, both versions of Spark) Hit the checkbox
@@ -136,6 +186,15 @@ whatever into that box. Wait a minute or two and hit refresh, and your staging
repo should now have a URL shown in its summary tab (eg
`https://repository.apache.org/content/repositories/orgapachekudu-1005`)
+. Add your PGP key to the KEYS file:
++
+----
+svn co https://dist.apache.org/repos/dist/release/kudu/ kudu-dist-release
+cd kudu-dist-release
+(gpg --list-sigs <your-email-address> && gpg --armor --export <your-email-address>) >> KEYS
+svn commit -m "Adding my key to the KEYS file"
+----
+
== Initiating a Vote for an RC
. Send an email to dev@kudu.apache.org to start the RC process, using
@@ -160,18 +219,18 @@ repo should now have a URL shown in its summary tab (eg
+
----
cd kudu
- mkdir 0.9.2
- cp <path_to_rc_artifacts>/* 0.9.2
- svn add 0.9.2
- svn commit -m "Adding files for Kudu 0.9.2 RC"
+ mkdir 1.x.y
+ cp <path_to_rc_artifacts>/* 1.x.y
+ svn add 1.x.y
+ svn commit -m "Adding files for Kudu 1.x.y RC"
----
. In the Kudu git repo, create a signed tag from the RC’s tag, and push it to the
Apache Git repository:
+
----
- git tag -s 0.9.2 -m 'Release Apache Kudu 0.9.2' 0.9.2-RC1
- git push apache 0.9.2-RC1
+ git tag -s 1.x.y -m 'Release Apache Kudu 1.x.y' 1.x.y-RC1
+ git push apache 1.x.y-RC1
----
. Release the staged Java artifacts. Select the release candidate staging
@@ -196,10 +255,10 @@ Apache Git repository:
. About 24 hours after the first step was completed, send an email to
user@kudu.apache.org, dev@kudu.apache.org, and announce@apache.org
to announce the release. The email should be similar to
- link:http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3CCAGpTDNeHW53US=qdpQPCQk0WaFBxx_KNx1E9b6NBBnbWpkSpmQ@mail.gmail.com%3E[this].
+ link:https://s.apache.org/pduz[this].
. About another 24 hours later, delete the previous minor version in the branch
you released from, from SVN. For example, if you released 1.2.1, delete `1.2.0`.
. Update the version number on the branch you released from back to a SNAPSHOT
- for the next patch release, such as `0.9.2-SNAPSHOT` after the `0.9.1` release.
+ for the next patch release, such as `1.6.1-SNAPSHOT` after the `1.6.0` release.