You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by aw...@apache.org on 2019/02/27 02:14:43 UTC

[kudu] branch branch-1.9.x updated (a2f0ba7 -> deb0f04)

This is an automated email from the ASF dual-hosted git repository.

awong pushed a change to branch branch-1.9.x
in repository https://gitbox.apache.org/repos/asf/kudu.git.


    from a2f0ba7  KUDU-2690: don't roll log schema on failed alter
     new 98237d2  test: deflake TabletReplicaTest.TestRollLogSegmentSchemaOnAlter
     new f4f4f81  KUDU-2710: Fix KeepAliveRequest retries
     new deb0f04  KUDU-2706: Work around the lack of thread safety in krb5_parse_name()

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../org/apache/kudu/client/AsyncKuduScanner.java    |  6 ++++++
 .../main/java/org/apache/kudu/client/RpcProxy.java  | 21 +++++++++++++++++++++
 .../java/org/apache/kudu/client/TestKuduClient.java | 10 +++++++++-
 src/kudu/security/init.cc                           | 12 ++++++++++++
 src/kudu/tablet/tablet_replica-test.cc              |  3 +++
 5 files changed, 51 insertions(+), 1 deletion(-)


[kudu] 03/03: KUDU-2706: Work around the lack of thread safety in krb5_parse_name()

Posted by aw...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

awong pushed a commit to branch branch-1.9.x
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit deb0f04b010c5399771879829285c284b8f20008
Author: Michael Ho <kw...@cloudera.com>
AuthorDate: Wed Feb 20 16:51:26 2019 -0800

    KUDU-2706: Work around the lack of thread safety in krb5_parse_name()
    
    krb5_init_context() sets the field 'default_realm' in a krb5_context
    object to 0. Upon first call to krb5_parse_name() with a principal
    without realm specified (e.g. foo/bar), 'default_realm' in the
    krb5_context object is lazily initialized.
    
    When more than one negotiation threads are configured, it's possible
    for multiple threads to call CanonicalizeKrb5Principal() in parallel.
    CanonicalizeKrb5Principal() in turn calls krb5_parse_name(g_krb5_ctx, ...)
    with no lock held. In addition, krb5_parse_name() is not thread safe as
    it lazily initializes 'context->default_realm' without holding lock.
    Consequently, 'g_krb5_ctx' which is shared and not supposed to be modified
    after initialization may be inadvertently modified concurrently by multiple
    threads, leading to crashes (e.g. double free) or errors.
    
    This change works around the problem by initializing 'g_krb5_ctx->default_realm'
    once in InitKrb5Ctx() by calling krb5_get_default_realm().
    
    TODO: Fix unsafe sharing of 'g_krb5_ctx'. According to Kerberos documentation
    (https://github.com/krb5/krb5/blob/master/doc/threads.txt), any use of krb5_context
    must be confined to one thread at a time by the application code. The current
    sharing of 'g_krb5_ctx' between threads without synchronization is in fact unsafe.
    
    Change-Id: I1bf9224516e2996f51f319088179727f76741ebe
    Reviewed-on: http://gerrit.cloudera.org:8080/12545
    Reviewed-by: Alexey Serbin <as...@cloudera.com>
    Tested-by: Kudu Jenkins
    (cherry picked from commit 25af98eaf4c712bef9033721ea58b3f0d0a78c32)
    Reviewed-on: http://gerrit.cloudera.org:8080/12607
    Reviewed-by: Andrew Wong <aw...@cloudera.com>
    Tested-by: Andrew Wong <aw...@cloudera.com>
---
 src/kudu/security/init.cc | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/kudu/security/init.cc b/src/kudu/security/init.cc
index 58ef83c..200411f 100644
--- a/src/kudu/security/init.cc
+++ b/src/kudu/security/init.cc
@@ -159,6 +159,18 @@ void InitKrb5Ctx() {
   static std::once_flag once;
   std::call_once(once, [&]() {
       CHECK_EQ(krb5_init_context(&g_krb5_ctx), 0);
+      // Work around the lack of thread safety in krb5_parse_name() by implicitly
+      // initializing g_krb5_ctx->default_realm once. The assumption is that this
+      // function is called once in a single thread environment during initialization.
+      //
+      // TODO(KUDU-2706): Fix unsafe sharing of 'g_krb5_ctx'.
+      // According to Kerberos documentation
+      // (https://github.com/krb5/krb5/blob/master/doc/threads.txt), any use of
+      // krb5_context must be confined to one thread at a time by the application code.
+      // The current way of sharing of 'g_krb5_ctx' between threads is actually unsafe.
+      char* unused_realm;
+      CHECK_EQ(krb5_get_default_realm(g_krb5_ctx, &unused_realm), 0);
+      krb5_free_default_realm(g_krb5_ctx, unused_realm);
     });
 }
 


[kudu] 01/03: test: deflake TabletReplicaTest.TestRollLogSegmentSchemaOnAlter

Posted by aw...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

awong pushed a commit to branch branch-1.9.x
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit 98237d2c85eb3ef8fe501b9ad623eb2213b0b327
Author: Andrew Wong <aw...@cloudera.com>
AuthorDate: Tue Feb 26 13:06:52 2019 -0800

    test: deflake TabletReplicaTest.TestRollLogSegmentSchemaOnAlter
    
    We should wait for the replica to become leader so it's usable for
    writes. Without this fix, the test failed 81/1000 times in release mode,
    and passed 1000/1000 with it.
    
    Change-Id: Ie6016f717183ca7bcf98c0874a24f1463e113cef
    Reviewed-on: http://gerrit.cloudera.org:8080/12597
    Reviewed-by: Adar Dembo <ad...@cloudera.com>
    Tested-by: Kudu Jenkins
---
 src/kudu/tablet/tablet_replica-test.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/kudu/tablet/tablet_replica-test.cc b/src/kudu/tablet/tablet_replica-test.cc
index 4cf9f03..a0960cc 100644
--- a/src/kudu/tablet/tablet_replica-test.cc
+++ b/src/kudu/tablet/tablet_replica-test.cc
@@ -233,6 +233,9 @@ class TabletReplicaTest : public KuduTabletTest {
                                      scoped_refptr<ResultTracker>(),
                                      log,
                                      prepare_pool_.get()));
+    // Wait for the replica to be usable.
+    const MonoDelta kTimeout = MonoDelta::FromSeconds(30);
+    ASSERT_OK(tablet_replica_->consensus()->WaitUntilLeaderForTests(kTimeout));
   }
 
  protected:


[kudu] 02/03: KUDU-2710: Fix KeepAliveRequest retries

Posted by aw...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

awong pushed a commit to branch branch-1.9.x
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit f4f4f81eafd3caa303b89eaa3ac4ca2448df5867
Author: Grant Henke <gr...@apache.org>
AuthorDate: Mon Feb 25 21:17:01 2019 -0600

    KUDU-2710: Fix KeepAliveRequest retries
    
    Fixes KeepAliveRequest retries by adding a partitionKey
    implementation. Without this a null partitionKey is passed
    and the client treats this as a master table.
    
    A follow up patch should include fixes to prevent issues
    like this in the future and fix any remaining retry issues.
    This patch is kept small to ensure easy backports.
    
    Change-Id: I951212ab76079e5788c2870223b45782b16509e7
    Reviewed-on: http://gerrit.cloudera.org:8080/12586
    Tested-by: Kudu Jenkins
    Reviewed-by: Alexey Serbin <as...@cloudera.com>
    Reviewed-by: Adar Dembo <ad...@cloudera.com>
    (cherry picked from commit 6302811eb73efdfd2a3da84c25f5d6589302dee1)
    Reviewed-on: http://gerrit.cloudera.org:8080/12608
    Reviewed-by: Grant Henke <gr...@apache.org>
    Tested-by: Andrew Wong <aw...@cloudera.com>
---
 .../org/apache/kudu/client/AsyncKuduScanner.java    |  6 ++++++
 .../main/java/org/apache/kudu/client/RpcProxy.java  | 21 +++++++++++++++++++++
 .../java/org/apache/kudu/client/TestKuduClient.java | 10 +++++++++-
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java b/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
index 36bb0af..fd085b4 100644
--- a/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
+++ b/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
@@ -864,6 +864,12 @@ public final class AsyncKuduScanner {
     }
 
     @Override
+    public byte[] partitionKey() {
+      // This key is used to lookup where the request needs to go
+      return pruner.nextPartitionKey();
+    }
+
+    @Override
     Pair<Void, Object> deserialize(final CallResponse callResponse,
                                    String tsUUID) throws KuduException {
       ScannerKeepAliveResponsePB.Builder builder = ScannerKeepAliveResponsePB.newBuilder();
diff --git a/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java b/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
index 347e3c8..5a2728d 100644
--- a/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
+++ b/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
@@ -60,6 +60,9 @@ class RpcProxy {
 
   private static final Logger LOG = LoggerFactory.getLogger(RpcProxy.class);
 
+  private static int staticNumFail = 0;
+  private static Exception staticException = null;
+
   /** The reference to the top-level Kudu client object. */
   @Nonnull
   private final AsyncKuduClient client;
@@ -90,6 +93,18 @@ class RpcProxy {
   }
 
   /**
+   * Fails the next numFail RPCs by throwing the passed exception.
+   * @param numFail the number of RPCs to fail
+   * @param exception the exception to throw when failing an rpc
+   */
+  @InterfaceAudience.LimitedPrivate("Test")
+  static void failNextRpcs(int numFail, Exception exception) {
+    Preconditions.checkNotNull(exception);
+    staticNumFail = numFail;
+    staticException = exception;
+  }
+
+  /**
    * Send the specified RPC using the connection to the Kudu server.
    *
    * @param <R> type of the RPC
@@ -101,6 +116,12 @@ class RpcProxy {
                           final Connection connection,
                           final KuduRpc<R> rpc) {
     try {
+      // Throw an exception to enable testing failures. See `failNextRpcs`.
+      if (staticNumFail > 0) {
+        staticNumFail--;
+        LOG.warn("Forcing a failure on sendRpc: " + rpc);
+        throw staticException;
+      }
       if (!rpc.getRequiredFeatures().isEmpty()) {
         // An extra optimization: when the peer's features are already known, check that the server
         // supports feature flags, if those are required.
diff --git a/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java b/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
index 303f53a..576ec88 100644
--- a/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
+++ b/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
@@ -47,6 +47,7 @@ import java.math.BigDecimal;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
+import java.util.Random;
 import java.util.concurrent.Callable;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
@@ -55,10 +56,11 @@ import java.util.concurrent.Future;
 import com.google.common.collect.ImmutableList;
 import com.stumbleupon.async.Deferred;
 
+import org.apache.kudu.test.ClientTestUtil;
 import org.apache.kudu.test.KuduTestHarness;
 import org.apache.kudu.test.KuduTestHarness.LocationConfig;
 import org.apache.kudu.test.KuduTestHarness.TabletServerConfig;
-import org.apache.kudu.test.ClientTestUtil;
+import org.apache.kudu.test.RandomUtils;
 import org.apache.kudu.util.TimestampUtil;
 import org.junit.Before;
 import org.junit.Rule;
@@ -286,8 +288,14 @@ public class TestKuduClient {
     // Wait for longer than the scanner ttl calling keepAlive throughout.
     // Each loop sleeps 25% of the scanner ttl and we loop 10 times to ensure
     // we extend over 2x the scanner ttl.
+    Random random = RandomUtils.getRandom();
     for (int i = 0; i < 10; i++) {
       Thread.sleep(SHORT_SCANNER_TTL_MS / 4);
+      // Force 1/3 of the keepAlive requests to retry up to 3 times.
+      if (i % 3 == 0) {
+        RpcProxy.failNextRpcs(random.nextInt(4),
+            new RecoverableException(Status.ServiceUnavailable("testKeepAlive")));
+      }
       scanner.keepAlive();
     }