You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2017/08/01 21:10:47 UTC

[1/4] kudu git commit: KUDU-2087. Fix failure to map Kerberos principal to username with FreeIPA

Repository: kudu
Updated Branches:
  refs/heads/master 57a1962aa -> de43ffe77


KUDU-2087. Fix failure to map Kerberos principal to username with FreeIPA

FreeIPA is a piece of software that automates and simplifies management
of MIT krb5, SSSD, some LDAP service, etc. FreeIPA configures a
localauth plugin[1] in krb5.conf to map Kerberos principals to local
usernames. In this configuration, Kudu daemons were failing to start up
due to failure to map their own service principals back to a username.
This is due to a number of issues:

1) FreeIPA distinguishes between service principals and user principals
and doesn't store a 'uid' field in LDAP for service principals. Thus,
when 'sssd' tries to map a service principal to a local unix user, it
determines that there is no such user (ie getpwnam() fails). This is by
design, best I can tell.

2) sssd's implementation of krb5_auth_to_localname[1] uses getpwnam to try
to map the kerberos principal to the local username. Because of the
above, it fails for service principals.

3) Prior to el7.3, ssd configures krb5 with 'enable_only = sssd' in the
localauth plugin section. This means that if sssd fails to perform the
mapping, it does not fall back to other mappings defined in krb5.conf
(eg explicitly defined auth_to_local rules). See [2]

4) Even after 7.3, there is an additional bug in sssd which I just
filed[3], which causes the fallback to still not work. Because of this,
we're getting the KRB5_PLUGIN_NO_HANDLE error code back up at the Kudu
layer.

We already have our own fallback case for KRB5_LNAME_NO_TRANS, and it
seems like we should just be handling PLUGIN_NO_HANDLE in the same way
to workaround the above behavior.

I tested this patch on a FreeIPA-configured system on el6.7. I was able
to successfully start a master with a FreeIPA-provided keytab and
authentication required, and use 'kudu table list' to authenticate to
it.

[1] https://github.com/SSSD/sssd/blob/master/src/krb5_plugin/sssd_krb5_localauth_plugin.c
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1297462
[3] https://pagure.io/SSSD/sssd/issue/3459

Change-Id: I7bc13b33053a73784350c9d30a3796a96d318c04
Reviewed-on: http://gerrit.cloudera.org:8080/7551
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/ed827e0f
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/ed827e0f
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/ed827e0f

Branch: refs/heads/master
Commit: ed827e0f0c23f154c06c43b4d43219cdd321e221
Parents: 57a1962
Author: Todd Lipcon <to...@apache.org>
Authored: Mon Jul 31 18:58:26 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Aug 1 20:54:57 2017 +0000

----------------------------------------------------------------------
 src/kudu/security/init.cc | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/ed827e0f/src/kudu/security/init.cc
----------------------------------------------------------------------
diff --git a/src/kudu/security/init.cc b/src/kudu/security/init.cc
index cb1569c..25e229f 100644
--- a/src/kudu/security/init.cc
+++ b/src/kudu/security/init.cc
@@ -435,9 +435,13 @@ Status MapPrincipalToLocalName(const std::string& principal, std::string* local_
   // first component of the principal.
   rc = KRB5_LNAME_NOTRANS;
 #endif
-  if (rc == KRB5_LNAME_NOTRANS) {
+  if (rc == KRB5_LNAME_NOTRANS || rc == KRB5_PLUGIN_NO_HANDLE) {
     // No name mapping specified. We fall back to simply taking the first component
     // of the principal, for compatibility with the default behavior of Hadoop.
+    //
+    // NOTE: KRB5_PLUGIN_NO_HANDLE isn't typically expected here, but works around
+    // a bug in SSSD's auth_to_local implementation: https://pagure.io/SSSD/sssd/issue/3459
+    //
     // TODO(todd): we should support custom configured auth-to-local mapping, since
     // most Hadoop ecosystem components do not load them from krb5.conf.
     if (princ->length > 0) {


[2/4] kudu git commit: [tools] Address potential flakiness of TestMoveTablet

Posted by to...@apache.org.
[tools] Address potential flakiness of TestMoveTablet

There's a few potential sources of flakiness in TestMoveTablet
and in the move tablet tool itself:
1. Leader step down may elect the same leader over and over,
preventing it from being removed from the config
2. With ongoing writes, the replicas may not all agree even though
they are all making progress
3. Sometimes leader changes, etc time out

This patch addresses these problems:
1. After step down, the former leader will snooze its failure
detector for 2x a normal election timeout
2. The tool waits for the new replica to see the new config
instead of for all servers to agree (since that's what we were
really waiting for anyway)
3. Timeouts have been bumped up to 30s, which is the norm for
similar tests.

Change-Id: Ic3aa5d816b403818f69baa71cf40b35b82ff9096
Reviewed-on: http://gerrit.cloudera.org:8080/7533
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/af343140
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/af343140
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/af343140

Branch: refs/heads/master
Commit: af343140b081bf5b324198c2c840d0da4c7ffb40
Parents: ed827e0
Author: Will Berkeley <wd...@apache.org>
Authored: Fri Jul 28 08:18:32 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Aug 1 21:00:11 2017 +0000

----------------------------------------------------------------------
 src/kudu/consensus/raft_consensus.cc |  5 +++++
 src/kudu/tools/kudu-admin-test.cc    |  5 +++--
 src/kudu/tools/tool_action_tablet.cc | 12 ++++++------
 3 files changed, 14 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/af343140/src/kudu/consensus/raft_consensus.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/raft_consensus.cc b/src/kudu/consensus/raft_consensus.cc
index 1ebd0ef..cd0b9dc 100644
--- a/src/kudu/consensus/raft_consensus.cc
+++ b/src/kudu/consensus/raft_consensus.cc
@@ -503,6 +503,11 @@ Status RaftConsensus::StepDown(LeaderStepDownResponsePB* resp) {
     return Status::OK();
   }
   RETURN_NOT_OK(BecomeReplicaUnlocked());
+
+  // Snooze the failure detector for an extra leader failure timeout.
+  // This should ensure that a different replica is elected leader after this one steps down.
+  WARN_NOT_OK(SnoozeFailureDetector(MinimumElectionTimeout(), ALLOW_LOGGING),
+              "unable to snooze failure detector after stepping down");
   return Status::OK();
 }
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/af343140/src/kudu/tools/kudu-admin-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/kudu-admin-test.cc b/src/kudu/tools/kudu-admin-test.cc
index 379959d..7237548 100644
--- a/src/kudu/tools/kudu-admin-test.cc
+++ b/src/kudu/tools/kudu-admin-test.cc
@@ -51,6 +51,7 @@ using itest::TServerDetails;
 using itest::WAIT_FOR_LEADER;
 using itest::WaitForReplicasReportedToMaster;
 using itest::WaitForServersToAgree;
+using itest::WaitUntilCommittedConfigNumVotersIs;
 using itest::WaitUntilCommittedOpIdIndexIs;
 using itest::WaitUntilTabletInState;
 using itest::WaitUntilTabletRunning;
@@ -236,8 +237,8 @@ TEST_F(AdminCliTest, TestMoveTablet) {
     for (const string& uuid : active_tservers) {
       InsertOrDie(&active_tservers_map, uuid, tablet_servers_[uuid]);
     }
-    ASSERT_OK(WaitForServersToAgree(MonoDelta::FromSeconds(10), active_tservers_map,
-                                    tablet_id_, 1));
+    ASSERT_OK(WaitUntilCommittedConfigNumVotersIs(FLAGS_num_replicas, active_tservers_map[add],
+                                                  tablet_id_, MonoDelta::FromSeconds(30)));
   }
   workload.StopAndJoin();
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/af343140/src/kudu/tools/tool_action_tablet.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/tool_action_tablet.cc b/src/kudu/tools/tool_action_tablet.cc
index fefc43a..5eba3f6 100644
--- a/src/kudu/tools/tool_action_tablet.cc
+++ b/src/kudu/tools/tool_action_tablet.cc
@@ -45,7 +45,7 @@
 
 DEFINE_int64(move_copy_timeout_sec, 600,
              "Number of seconds to wait for tablet copy to complete when relocating a tablet");
-DEFINE_int64(move_leader_timeout_sec, 10,
+DEFINE_int64(move_leader_timeout_sec, 30,
              "Number of seconds to wait for a leader when relocating a leader tablet");
 
 namespace kudu {
@@ -365,13 +365,13 @@ Status MoveReplica(const RunnerContext &context) {
   const string& master_addresses_str = FindOrDie(context.required_args, kMasterAddressesArg);
   vector<string> master_addresses = strings::Split(master_addresses_str, ",");
   const string& tablet_id = FindOrDie(context.required_args, kTabletIdArg);
-  const string& rem_replica_uuid = FindOrDie(context.required_args, kFromTsUuidArg);
-  const string& add_replica_uuid = FindOrDie(context.required_args, kToTsUuidArg);
+  const string& from_ts_uuid = FindOrDie(context.required_args, kFromTsUuidArg);
+  const string& to_ts_uuid = FindOrDie(context.required_args, kToTsUuidArg);
 
   // Check the tablet is in perfect health and, if so, add the new server.
   RETURN_NOT_OK_PREPEND(DoKsckForTablet(master_addresses, tablet_id),
                         "ksck pre-move health check failed");
-  RETURN_NOT_OK(DoChangeConfig(master_addresses, tablet_id, add_replica_uuid,
+  RETURN_NOT_OK(DoChangeConfig(master_addresses, tablet_id, to_ts_uuid,
                                RaftPeerPB::VOTER, consensus::ADD_SERVER));
 
   // Wait until the tablet copy completes and the tablet returns to perfect health.
@@ -386,13 +386,13 @@ Status MoveReplica(const RunnerContext &context) {
   string leader_uuid;
   HostPort leader_hp;
   RETURN_NOT_OK(GetTabletLeader(client, tablet_id, &leader_uuid, &leader_hp));
-  if (rem_replica_uuid == leader_uuid) {
+  if (from_ts_uuid == leader_uuid) {
     RETURN_NOT_OK_PREPEND(ChangeLeader(client, tablet_id,
                                        leader_uuid, leader_hp,
                                        MonoDelta::FromSeconds(FLAGS_move_leader_timeout_sec)),
                           "failed changing leadership from the replica to be removed");
   }
-  return DoChangeConfig(master_addresses, tablet_id, rem_replica_uuid,
+  return DoChangeConfig(master_addresses, tablet_id, from_ts_uuid,
                         boost::none, consensus::REMOVE_SERVER);
 }
 


[4/4] kudu git commit: [docs] guide on assertions in the Java code

Posted by to...@apache.org.
[docs] guide on assertions in the Java code

Added guide on using assert and Guava Preconditions in the Kudu Java
client code. That's a compilation of the information from the following
e-mail thread:

  https://lists.apache.org/thread.html/13e39d1c4e632c5fcc134097a045fe89f5a2955ac3838a48e4e38bc2@%3Cdev.kudu.apache.org%3E

Also, separated the CMake style guide from the C++ code style section.

Change-Id: I78c249021f9eb9a9a94cdf1ff1b2dae94561c2fd
Reviewed-on: http://gerrit.cloudera.org:8080/7549
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/de43ffe7
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/de43ffe7
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/de43ffe7

Branch: refs/heads/master
Commit: de43ffe778d47e458037cf7ed950899fdb65ab83
Parents: ec93064
Author: Alexey Serbin <as...@cloudera.com>
Authored: Mon Jul 31 22:28:04 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Aug 1 21:09:57 2017 +0000

----------------------------------------------------------------------
 docs/contributing.adoc | 83 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 64 insertions(+), 19 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/de43ffe7/docs/contributing.adoc
----------------------------------------------------------------------
diff --git a/docs/contributing.adoc b/docs/contributing.adoc
index 0dd15fd..26a0147 100644
--- a/docs/contributing.adoc
+++ b/docs/contributing.adoc
@@ -140,7 +140,7 @@ submit a patch to the review, even if you were not the original reviewer.
 Gerrit allows you to vote on a review. A vote of `+2` from at least one committer
 (besides the submitter) is required before the patch can be merged.
 
-== Code Style
+== {cpp} Code Style
 
 Get familiar with these guidelines so that your contributions can be reviewed and
 integrated quickly and easily.
@@ -283,24 +283,6 @@ and decremented when the `Callback` goes out of scope.
 See the large file comment in _gutil/callback.h_ for more details, and
 _util/callback_bind-test.cc_ for examples.
 
-=== `CMake` Style Guide
-
-`CMake` allows commands in lower, upper, or mixed case. To keep
-the CMake files consistent, please use the following guidelines:
-
-* *built-in commands* in lowercase
-----
-add_subdirectory(some/path)
-----
-* *built-in arguments* in uppercase
-----
-message(STATUS "message goes here")
-----
-* *custom commands or macros* in uppercase
-----
-ADD_KUDU_TEST(some-test)
-----
-
 === GFlags
 
 Kudu uses gflags for both command-line and file-based configuration. Use these guidelines
@@ -362,6 +344,69 @@ comment in _flag_tags.h_ for guidelines.
   creating `foo_wal_dir` and `bar_wal_dir` gflags, better to have a single
   `kudu_wal_dir` gflag for use universally.
 
+== Java Code Style
+
+=== Preconditions vs assert in the Kudu Java client
+
+Use `assert` for verification of the static (i.e. non-runtime) internal
+invariants. Internal means the pre- and post-conditions which are
+completely under control of the code of a class or a function itself and cannot
+be influenced by input parameters and other runtime/dynamic conditions.
+
+Use `Preconditions` for verification of the input parameters and the other
+conditions which are outside of the control of the local code, or conditions
+which are dependent on the state of other objects/components in runtime.
+
+[source,java]
+----
+Object pop() {
+  // Use Preconditions here because the external user of the class should not
+  // call pop() on an empty stack, but the stack itself is internally consistent
+  Preconditions.checkState(curSize > 0, "queue must not be empty");
+  Object toReturn = data[--curSize];
+  // Use an assert here because if we ended up with a negative size counter,
+  // that's an indication of a broken implementation of the stack; i.e. it's
+  // an invariant, not a state check.
+  assert curSize >= 0;
+  return toReturn;
+}
+----
+
+However, keep in mind that `assert` checks are enabled only when the JVM is
+run with `-ea` option. So, if some dynamic condition is crucial for the
+overall consistency (e.g. a data loss can occur if some dynamic condition is not
+satisfied and the code continues its execution), consider throwing an
+`AssertionError`:
+
+[source,java]
+----
+if (!isCriticalConditionSatisfied) {
+  throw new AssertionError("cannot continue: data loss is possible otherwise");
+}
+----
+
+==== References
+* link:https://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html[Programming With Assertions]
+* link:https://github.com/google/guava/wiki/PreconditionsExplained[Guava Preconditions Explained]
+
+== `CMake` Style Guide
+
+`CMake` allows commands in lower, upper, or mixed case. To keep
+the CMake files consistent, please use the following guidelines:
+
+* *built-in commands* in lowercase
+----
+add_subdirectory(some/path)
+----
+* *built-in arguments* in uppercase
+----
+message(STATUS "message goes here")
+----
+* *custom commands or macros* in uppercase
+----
+ADD_KUDU_TEST(some-test)
+----
+
 == Testing
 
 All new code should have tests.::


[3/4] kudu git commit: [docs] fixed typos in raft-config-change.md

Posted by to...@apache.org.
[docs] fixed typos in raft-config-change.md

Change-Id: Ie1f43915b5f2a393957bc0d6b9e120f7419c72b1
Reviewed-on: http://gerrit.cloudera.org:8080/7548
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/ec93064c
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/ec93064c
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/ec93064c

Branch: refs/heads/master
Commit: ec93064c921b56d0a6c79427eb313d5d1f5c24f1
Parents: af34314
Author: Alexey Serbin <as...@cloudera.com>
Authored: Mon Jul 31 21:12:53 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Aug 1 21:08:20 2017 +0000

----------------------------------------------------------------------
 docs/design-docs/raft-config-change.md | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/ec93064c/docs/design-docs/raft-config-change.md
----------------------------------------------------------------------
diff --git a/docs/design-docs/raft-config-change.md b/docs/design-docs/raft-config-change.md
index d598832..728fe4b 100644
--- a/docs/design-docs/raft-config-change.md
+++ b/docs/design-docs/raft-config-change.md
@@ -20,12 +20,11 @@ tablet. The use cases for this functionality include:
 
 * Replacing a failed tablet server to maintain the desired replication
   factor of tablet data.
-* Growing the Kudu cluster over time. "Rebalancing" tablet locations to even
-* out the load across tablet
-  servers.
+* Growing the Kudu cluster over time. This might need "rebalancing" tablet
+  locations to even out the load across tablet servers.
 * Increasing the replication of one or more tablets of a table if they
-  become hot (eg in a time series workload, making today’s partitions have a
-  higher replication)
+  become hot (e.g. in a time series workload, making today’s partitions have a
+  higher replication).
 
 ## Scope
 This document covers the following topics:
@@ -236,7 +235,7 @@ and replay all data, it may be tens of minutes or even hours before regaining
 fault tolerance.
 
 As an example, consider the case of a four-node cluster, each node having 1TB
-of replica data. If a node fails, then its 1TB worth of data must be transfered
+of replica data. If a node fails, then its 1TB worth of data must be transferred
 among the remaining nodes, so we need to wait for 300+GB of data to transfer,
 which could take up to an hour. During that hour, we would have no
 latency-leveling on writes unless we did something like the above.