You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2017/11/21 18:25:56 UTC

[1/3] kudu git commit: Mark hms_client-test as RUN_SERIAL to avoid timeouts

Repository: kudu
Updated Branches:
  refs/heads/master a4642a41e -> 3e59fd7b1


Mark hms_client-test as RUN_SERIAL to avoid timeouts

It seems this test often times out when running in parallel with others.
Let's see if running it serially helps.

Change-Id: Idc152579d45ec4c945f1bc60e71af3c30ded645f
Reviewed-on: http://gerrit.cloudera.org:8080/8566
Reviewed-by: Dan Burkert <da...@apache.org>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/63b53ad2
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/63b53ad2
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/63b53ad2

Branch: refs/heads/master
Commit: 63b53ad2c9a26a3a9eab348471d4f247eb228315
Parents: a4642a4
Author: Todd Lipcon <to...@apache.org>
Authored: Wed Nov 15 21:04:01 2017 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Nov 21 04:37:34 2017 +0000

----------------------------------------------------------------------
 src/kudu/hms/CMakeLists.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/63b53ad2/src/kudu/hms/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/src/kudu/hms/CMakeLists.txt b/src/kudu/hms/CMakeLists.txt
index e9f92a5..e449047 100644
--- a/src/kudu/hms/CMakeLists.txt
+++ b/src/kudu/hms/CMakeLists.txt
@@ -73,5 +73,7 @@ if (NOT NO_TESTS)
     mini_hms
     ${KUDU_MIN_TEST_LIBS})
 
-  ADD_KUDU_TEST(hms_client-test)
+  # This test has to run serially since otherwise starting the HMS can take a very
+  # long time.
+  ADD_KUDU_TEST(hms_client-test RUN_SERIAL true)
 endif()


[3/3] kudu git commit: KUDU-2220: GetEndOfChainX509 does not return end-user cert

Posted by to...@apache.org.
KUDU-2220: GetEndOfChainX509 does not return end-user cert

KUDU-2091 introduced a function GetEndOfChainX509() which was supposed
to return the "end-user" certificate. However, the end-user certificate
is not at the end of the chain, but rather at the beginning of the chain
as specificed by the RFC:
https://tools.ietf.org/html/rfc5246#section-7.4.2

  | This is a sequence (chain) of certificates. The sender's certificate MUST
  | come first in the list. Each following certificate MUST directly certify
  | the one preceding it.

This patch fixes this by changing the GetEndOfChainX509() to
GetTopOfChainX509(). An existing test is modified to test this patch. It does
not pass without this change.

Change-Id: I0e3f913259ec4c855ff211726fa6ecea94d328e7
Reviewed-on: http://gerrit.cloudera.org:8080/8595
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Todd Lipcon <to...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/3e59fd7b
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/3e59fd7b
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/3e59fd7b

Branch: refs/heads/master
Commit: 3e59fd7b14b4a2ba2846621df04093cce9024688
Parents: 36a5e76
Author: Sailesh Mukil <sa...@apache.org>
Authored: Sun Nov 19 15:33:30 2017 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Nov 21 18:24:28 2017 +0000

----------------------------------------------------------------------
 src/kudu/rpc/rpc-test.cc                |  2 +-
 src/kudu/security/ca/cert_management.cc |  2 +-
 src/kudu/security/cert.cc               | 24 ++++++++--------
 src/kudu/security/cert.h                |  4 +--
 src/kudu/security/test/test_certs.cc    | 43 ++++++++++++++++++++++++++++
 src/kudu/security/tls_context.cc        | 10 +++----
 src/kudu/security/tls_handshake.cc      |  2 +-
 7 files changed, 65 insertions(+), 22 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/3e59fd7b/src/kudu/rpc/rpc-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/rpc/rpc-test.cc b/src/kudu/rpc/rpc-test.cc
index 6ccd92d..aa92c13 100644
--- a/src/kudu/rpc/rpc-test.cc
+++ b/src/kudu/rpc/rpc-test.cc
@@ -184,7 +184,7 @@ TEST_P(TestRpc, TestCall) {
   }
 }
 
-TEST_P(TestRpc, TestCallWithChainCert) {
+TEST_P(TestRpc, TestCallWithChainCerts) {
   bool enable_ssl = GetParam();
   // We're only interested in running this test with TLS enabled.
   if (!enable_ssl) return;

http://git-wip-us.apache.org/repos/asf/kudu/blob/3e59fd7b/src/kudu/security/ca/cert_management.cc
----------------------------------------------------------------------
diff --git a/src/kudu/security/ca/cert_management.cc b/src/kudu/security/ca/cert_management.cc
index 5144216..624958f 100644
--- a/src/kudu/security/ca/cert_management.cc
+++ b/src/kudu/security/ca/cert_management.cc
@@ -394,7 +394,7 @@ Status CertSigner::DoSign(const EVP_MD* digest, int32_t exp_seconds,
 
   // If we have a CA cert, then the CA is the issuer.
   // Otherwise, we are self-signing so the target cert is also the issuer.
-  X509* issuer_cert = ca_cert_ ? ca_cert_->GetEndOfChainX509() : ret;
+  X509* issuer_cert = ca_cert_ ? ca_cert_->GetTopOfChainX509() : ret;
   X509_NAME* issuer_name = X509_get_subject_name(issuer_cert);
   OPENSSL_RET_NOT_OK(X509_set_issuer_name(ret, issuer_name),
       "error setting issuer name");

http://git-wip-us.apache.org/repos/asf/kudu/blob/3e59fd7b/src/kudu/security/cert.cc
----------------------------------------------------------------------
diff --git a/src/kudu/security/cert.cc b/src/kudu/security/cert.cc
index 6fef349..7d00a4b 100644
--- a/src/kudu/security/cert.cc
+++ b/src/kudu/security/cert.cc
@@ -69,9 +69,9 @@ int GetKuduKerberosPrincipalOidNid() {
   return nid;
 }
 
-X509* Cert::GetEndOfChainX509() const {
+X509* Cert::GetTopOfChainX509() const {
   CHECK_GT(chain_len(), 0);
-  return sk_X509_value(data_.get(), chain_len() - 1);
+  return sk_X509_value(data_.get(), 0);
 }
 
 Status Cert::FromString(const std::string& data, DataFormat format) {
@@ -95,16 +95,16 @@ Status Cert::FromFile(const std::string& fpath, DataFormat format) {
 }
 
 string Cert::SubjectName() const {
-  return X509NameToString(X509_get_subject_name(GetEndOfChainX509()));
+  return X509NameToString(X509_get_subject_name(GetTopOfChainX509()));
 }
 
 string Cert::IssuerName() const {
-  return X509NameToString(X509_get_issuer_name(GetEndOfChainX509()));
+  return X509NameToString(X509_get_issuer_name(GetTopOfChainX509()));
 }
 
 boost::optional<string> Cert::UserId() const {
   SCOPED_OPENSSL_NO_PENDING_ERRORS;
-  X509_NAME* name = X509_get_subject_name(GetEndOfChainX509());
+  X509_NAME* name = X509_get_subject_name(GetTopOfChainX509());
   char buf[1024];
   int len = X509_NAME_get_text_by_NID(name, NID_userId, buf, arraysize(buf));
   if (len < 0) return boost::none;
@@ -115,7 +115,7 @@ vector<string> Cert::Hostnames() const {
   SCOPED_OPENSSL_NO_PENDING_ERRORS;
   vector<string> result;
   auto gens = ssl_make_unique(reinterpret_cast<GENERAL_NAMES*>(X509_get_ext_d2i(
-      GetEndOfChainX509(), NID_subject_alt_name, nullptr, nullptr)));
+      GetTopOfChainX509(), NID_subject_alt_name, nullptr, nullptr)));
   if (gens) {
     for (int i = 0; i < sk_GENERAL_NAME_num(gens.get()); ++i) {
       GENERAL_NAME* gen = sk_GENERAL_NAME_value(gens.get(), i);
@@ -135,9 +135,9 @@ vector<string> Cert::Hostnames() const {
 
 boost::optional<string> Cert::KuduKerberosPrincipal() const {
   SCOPED_OPENSSL_NO_PENDING_ERRORS;
-  int idx = X509_get_ext_by_NID(GetEndOfChainX509(), GetKuduKerberosPrincipalOidNid(), -1);
+  int idx = X509_get_ext_by_NID(GetTopOfChainX509(), GetKuduKerberosPrincipalOidNid(), -1);
   if (idx < 0) return boost::none;
-  X509_EXTENSION* ext = X509_get_ext(GetEndOfChainX509(), idx);
+  X509_EXTENSION* ext = X509_get_ext(GetTopOfChainX509(), idx);
   ASN1_OCTET_STRING* octet_str = X509_EXTENSION_get_data(ext);
   const unsigned char* octet_str_data = octet_str->data;
   long len; // NOLINT(runtime/int)
@@ -153,7 +153,7 @@ boost::optional<string> Cert::KuduKerberosPrincipal() const {
 
 Status Cert::CheckKeyMatch(const PrivateKey& key) const {
   SCOPED_OPENSSL_NO_PENDING_ERRORS;
-  OPENSSL_RET_NOT_OK(X509_check_private_key(GetEndOfChainX509(), key.GetRawData()),
+  OPENSSL_RET_NOT_OK(X509_check_private_key(GetTopOfChainX509(), key.GetRawData()),
                      "certificate does not match private key");
   return Status::OK();
 }
@@ -164,12 +164,12 @@ Status Cert::GetServerEndPointChannelBindings(string* channel_bindings) const {
   // (hash) algorithm, and the public key type which signed the cert.
 
 #if OPENSSL_VERSION_NUMBER >= 0x10002000L
-  int signature_nid = X509_get_signature_nid(GetEndOfChainX509());
+  int signature_nid = X509_get_signature_nid(GetTopOfChainX509());
 #else
   // Older version of OpenSSL appear not to have a public way to get the
   // signature digest method from a certificate. Instead, we reach into the
   // 'private' internals.
-  int signature_nid = OBJ_obj2nid(GetEndOfChainX509()->sig_alg->algorithm);
+  int signature_nid = OBJ_obj2nid(GetTopOfChainX509()->sig_alg->algorithm);
 #endif
 
   // Retrieve the digest algorithm type.
@@ -252,7 +252,7 @@ void Cert::AdoptAndAddRefX509(X509* cert) {
 
 Status Cert::GetPublicKey(PublicKey* key) const {
   SCOPED_OPENSSL_NO_PENDING_ERRORS;
-  EVP_PKEY* raw_key = X509_get_pubkey(GetEndOfChainX509());
+  EVP_PKEY* raw_key = X509_get_pubkey(GetTopOfChainX509());
   OPENSSL_RET_IF_NULL(raw_key, "unable to get certificate public key");
   key->AdoptRawData(raw_key);
   return Status::OK();

http://git-wip-us.apache.org/repos/asf/kudu/blob/3e59fd7b/src/kudu/security/cert.h
----------------------------------------------------------------------
diff --git a/src/kudu/security/cert.h b/src/kudu/security/cert.h
index a64b8b8..9fe7a2d 100644
--- a/src/kudu/security/cert.h
+++ b/src/kudu/security/cert.h
@@ -94,8 +94,8 @@ class Cert : public RawDataWrapper<STACK_OF(X509)> {
   // Returns the end-user certificate's public key.
   Status GetPublicKey(PublicKey* key) const WARN_UNUSED_RESULT;
 
-  // Get the last certificate in the chain, otherwise known as the 'end-user' certificate.
-  X509* GetEndOfChainX509() const;
+  // Get the first certificate in the chain, otherwise known as the 'end-user' certificate.
+  X509* GetTopOfChainX509() const;
 };
 
 class CertSignRequest : public RawDataWrapper<X509_REQ> {

http://git-wip-us.apache.org/repos/asf/kudu/blob/3e59fd7b/src/kudu/security/test/test_certs.cc
----------------------------------------------------------------------
diff --git a/src/kudu/security/test/test_certs.cc b/src/kudu/security/test/test_certs.cc
index d7abb3e..bc82140 100644
--- a/src/kudu/security/test/test_certs.cc
+++ b/src/kudu/security/test/test_certs.cc
@@ -540,6 +540,49 @@ n6P1UwbFPiRGIzEAo0SSC1PRT8phv+5y0B1+gcj/peFymZVE+gRcrv9irVQqUpAY
 Lo9xrClAJ2xx4Ouz1GprKPoHdVyqtgcLXN4Oyi8Tehu96Zf6GytSEfTXsbQp+GgR
 TGRhKnDySjPhLp/uObfVwioyuAyA5mVCwjsZ/cvUUA==
 -----END CERTIFICATE-----
+-----BEGIN CERTIFICATE-----
+MIIHmDCCA4CgAwIBAgICEAAwDQYJKoZIhvcNAQEFBQAwVjELMAkGA1UEBhMCVVMx
+CzAJBgNVBAgMAkNBMQswCQYDVQQHDAJTRjENMAsGA1UECgwEQWNtZTENMAsGA1UE
+CwwES3VkdTEPMA0GA1UEAwwGUk9PVENBMB4XDTE3MDgxMTIxMzUzNVoXDTQ0MTIy
+NzIxMzUzNVowUTEXMBUGA1UEAwwOSW50ZXJtZWRpYXRlQ0ExCzAJBgNVBAgMAkNB
+MQswCQYDVQQGEwJVUzENMAsGA1UECgwEQWNtZTENMAsGA1UECwwES3VkdTCCAiIw
+DQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAM1X35LT/eBWBt0Uqqh3DSUyY3K8
+HLIlX3ZXg2Nx6y8yqhw5UGVFZl0uYBDo2DSlTl4sey+AxLIbpQI9ArRA+xqmFynV
+jheB9otudnA8hVwi/e9o+m+VSjG+HPRjSS5hwdPgpJG8DCPSmGyUUFtf3v0NxkUq
+Is+fB5qhQ36aQkI+MwQsSlHR+YrrKKVnE3f911wr9OScQP5KHjrZLQex8OmpWD9G
+v4P9jfVSUwmNEXXjmXDhNG/1R4ofX6HogZR6lBmRNGbcjjWRZQmPrOe9YcdkMLD0
+CdaUyKikqqW6Ilxs7scfuCGqwBWqh66tY18MBMHnt0bL26atTPduKYqulJ1pijio
+DUrzqtAzm7PirqPZ4aOJ9PNjdQs9zH3Zad3pcjfjpdKj4a/asX0st631J5jE6MLB
+LcbAerb/Csr/+tD0TOxwWlA+p/6wPb8ECflQLkvDDEY5BrRGdqYDpEOdm1F9DWQh
+y0RB8rWJMkxC/tTqYHfeaphzCxndLRsZQKVcPiqWCT7b431umIjPaDhsykNlcU3N
+f0V7V/fLY6wwuACngS0BLQuMrXy5FyhmWnUBeWwHfAeTxCkHlF+cVT6wHmeOuGbC
+c1piq7O7puKdC3UjO7Nn+WoOb2B6Qm/dajHpj5myxYJa5tGQGeUnWPwjjMQR557k
+HzugGAzkuG1ASQrhAgMBAAGjdTBzMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYE
+FPCnLX2qgKtPPOwh6g7CAtEZIO6EMB8GA1UdIwQYMBaAFE/9XKaDey5kC8f3bCeU
+HW46abboMAsGA1UdDwQEAwIBpjATBgNVHSUEDDAKBggrBgEFBQcDATANBgkqhkiG
+9w0BAQUFAAOCBAEAIaD2yzjTFdn61A4Qi+ek3fBJaDNQZytd0rHb49v3T+mdj/MI
+yShI1qezDFkg2FP1LfNgjuQl/T+g0BloXatAhZ/dj20Y8oN6bmilV+r2YLJbvbTn
+3hI+MxNf3Ue3FmIrwKK3QdkWcDBURpyYaDO71oxPl9QNfdhWCGHB/oWKU2y4Qt/O
+aPy+CmBAZEclX+hsdUBDJG5vuujpv4myCFwpLgFKNQX3XqCPLc4SRjfyla2YmeZv
+j7KKYh8XOWBbBF0BnWD94WzUDIBmFlUfS32aJTvd7tVaWXwH8rGwDfLN8i05UD9G
+zc3uuFH+UdzWVymk/4svKIPlB2nw9vPV8hvRRah0yFN3EQqAF0vQtwVJF/VwtZdg
+ahH0DykYTf7cKtFXE40xB7YgwDLXd3UiXfo3USW28uKqsrO52xYuUTBn+xkilds1
+tNKwtpXFWP2PUk92ficxoqi1cJnHxIIt5HKskFPgfIpzkpR8IM/vsom1a5fn4TT1
+aJbO5FsZTXQMxFLYWiSOMhTZMp3iNduxMYPosngjjKPEIkTQHKkedpF+CAGIMOKE
+BVa0vHyF34laKMMDT8d9yxwBJLqjlBohNsLLZa/Y90ThaMw+QYn/GZATB+7ng+ip
+VdGAQrghsGSxP+47HZ6WgBrlRdUWN1d1tlN2NBMHLucpbra5THGzl5MlaSVBYZb6
+yXI+2lwcTnnEkKv2zoA4ZHWdtLn/b1y4NKNg205TA+sOZcl6B1BgMe/rFuXdZe9Q
+/b6Tjz65qL4y1ByBVBJNhQQairw6cypHzwzC3w6ub1ZXtFqnTlU8fFcHGeOyydYS
+NfoepF0w2v0ounqD+6rN1CH/ERVb4FCEN19HQ3z+rAj19z2h6m/l5QEKI7bz8ghD
+8yxyqJz+L9XpfOo1yZfHQJckilY6BBIGWyeetJBmvkwv2WPt+3pX1u7h5LkvNRj2
+3fItf486zqtzUi+i/E//rS4gD/rRr4a85U8GSfp3LSAbtmfC0LNYUYA9Dcc0LSpl
+9alNuEpBHSHXlCVh4bcOb0L9n5XNdMcUYBo14hQdP0K1G7TounuAXFKYIQeyNyoi
+OAZ+eb7Y2xNnkY/ps/kyhsZgOJyiDZhdcruK3FIUGYlg5aVjQTB8H0c3/5SZnSky
+6779yMKztFXj9ctYU0YyJXWdF0xP/vi1gjQx/hJnDfXFfIOmeJdQSC08BGyK/PeC
+8zAS380bgzOza/eBL6IK0RqytbWgdoLrUQQfa1+f7AQxDDdoOkUenM0HSWjKfCuG
+m1/N7KUDHtnjVIHWqRefTPg1/tQjVY8/zgxN8MyAy+D95y4rawjsJf1dL6c0+zGv
+Wd40Cr+wAdHKN6t/oransoxu0EZ3HcSOI1umFg==
+-----END CERTIFICATE-----
 )";
   const char* kKey = R"(
 -----BEGIN RSA PRIVATE KEY-----

http://git-wip-us.apache.org/repos/asf/kudu/blob/3e59fd7b/src/kudu/security/tls_context.cc
----------------------------------------------------------------------
diff --git a/src/kudu/security/tls_context.cc b/src/kudu/security/tls_context.cc
index 28e180b..f708ac7 100644
--- a/src/kudu/security/tls_context.cc
+++ b/src/kudu/security/tls_context.cc
@@ -182,7 +182,7 @@ Status TlsContext::VerifyCertChainUnlocked(const Cert& cert) {
   X509_STORE* store = SSL_CTX_get_cert_store(ctx_.get());
   auto store_ctx = ssl_make_unique<X509_STORE_CTX>(X509_STORE_CTX_new());
 
-  OPENSSL_RET_NOT_OK(X509_STORE_CTX_init(store_ctx.get(), store, cert.GetEndOfChainX509(), nullptr),
+  OPENSSL_RET_NOT_OK(X509_STORE_CTX_init(store_ctx.get(), store, cert.GetTopOfChainX509(), nullptr),
                      "could not init X509_STORE_CTX");
   int rc = X509_verify_cert(store_ctx.get());
   if (rc != 1) {
@@ -226,7 +226,7 @@ Status TlsContext::UseCertificateAndKey(const Cert& cert, const PrivateKey& key)
 
   OPENSSL_RET_NOT_OK(SSL_CTX_use_PrivateKey(ctx_.get(), key.GetRawData()),
                      "failed to use private key");
-  OPENSSL_RET_NOT_OK(SSL_CTX_use_certificate(ctx_.get(), cert.GetEndOfChainX509()),
+  OPENSSL_RET_NOT_OK(SSL_CTX_use_certificate(ctx_.get(), cert.GetTopOfChainX509()),
                      "failed to use certificate");
   has_cert_ = true;
   return Status::OK();
@@ -355,7 +355,7 @@ Status TlsContext::GenerateSelfSignedCertAndKey() {
   // a leak. Calling this nonsense X509_check_ca() forces the X509 extensions to
   // get cached, so we don't hit the race later. 'VerifyCertChain' also has the
   // effect of triggering the racy codepath.
-  ignore_result(X509_check_ca(cert.GetEndOfChainX509()));
+  ignore_result(X509_check_ca(cert.GetTopOfChainX509()));
   ERR_clear_error(); // in case it left anything on the queue.
 
   // Step 4: Adopt the new key and cert.
@@ -363,7 +363,7 @@ Status TlsContext::GenerateSelfSignedCertAndKey() {
   CHECK(!has_cert_);
   OPENSSL_RET_NOT_OK(SSL_CTX_use_PrivateKey(ctx_.get(), key.GetRawData()),
                      "failed to use private key");
-  OPENSSL_RET_NOT_OK(SSL_CTX_use_certificate(ctx_.get(), cert.GetEndOfChainX509()),
+  OPENSSL_RET_NOT_OK(SSL_CTX_use_certificate(ctx_.get(), cert.GetTopOfChainX509()),
                      "failed to use certificate");
   has_cert_ = true;
   csr_ = std::move(csr);
@@ -403,7 +403,7 @@ Status TlsContext::AdoptSignedCert(const Cert& cert) {
     return Status::RuntimeError("certificate public key does not match the CSR public key");
   }
 
-  OPENSSL_RET_NOT_OK(SSL_CTX_use_certificate(ctx_.get(), cert.GetEndOfChainX509()),
+  OPENSSL_RET_NOT_OK(SSL_CTX_use_certificate(ctx_.get(), cert.GetTopOfChainX509()),
                      "failed to use certificate");
 
   // This should never fail since we already compared the cert's public key

http://git-wip-us.apache.org/repos/asf/kudu/blob/3e59fd7b/src/kudu/security/tls_handshake.cc
----------------------------------------------------------------------
diff --git a/src/kudu/security/tls_handshake.cc b/src/kudu/security/tls_handshake.cc
index 6daea6b..254b553 100644
--- a/src/kudu/security/tls_handshake.cc
+++ b/src/kudu/security/tls_handshake.cc
@@ -137,7 +137,7 @@ Status TlsHandshake::Verify(const Socket& socket) const {
   }
 
   // Get the peer certificate.
-  X509* cert = remote_cert_.GetEndOfChainX509();
+  X509* cert = remote_cert_.GetTopOfChainX509();
   if (!cert) {
     if (SSL_get_verify_mode(ssl_.get()) & SSL_VERIFY_FAIL_IF_NO_PEER_CERT) {
       return Status::NotAuthorized("Handshake failed: unable to retreive peer certificate");


[2/3] kudu git commit: maintenance_manager: fix a deadlock on shutdown

Posted by to...@apache.org.
maintenance_manager: fix a deadlock on shutdown

The shutdown sequence of the tablet server first shuts down the maintenance
manager and then calls Unregister() on the registered ops.

This produced a potential hang on shutdown, since the 'Shutdown()' call could
run at the same time that some maintenance ops were waiting on the thread_pool_
queue. Those waiting functions would be removed from the queue silently. We
depend on the functions running to decrement the 'running_' count of the associated
op, so when they were removed silently, the 'Unregister()' call could block forever
waiting for the 'running_' count to go to 0.

This caused a timeout of about 0.5% of runs of the new stop-tablet-itest
'TestShutdownWhileWriting' test case. With this fix, no runs time out.

Change-Id: Icaf864299bfd43212bc9655f48128851b9c1d59b
Reviewed-on: http://gerrit.cloudera.org:8080/8616
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mp...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/36a5e766
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/36a5e766
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/36a5e766

Branch: refs/heads/master
Commit: 36a5e7669551d0f79c340eadb02ce8d51defb590
Parents: 63b53ad
Author: Todd Lipcon <to...@apache.org>
Authored: Mon Nov 20 23:05:40 2017 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Tue Nov 21 18:07:54 2017 +0000

----------------------------------------------------------------------
 src/kudu/util/maintenance_manager.cc | 5 +++++
 1 file changed, 5 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/36a5e766/src/kudu/util/maintenance_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/util/maintenance_manager.cc b/src/kudu/util/maintenance_manager.cc
index 8984de8..50d792e 100644
--- a/src/kudu/util/maintenance_manager.cc
+++ b/src/kudu/util/maintenance_manager.cc
@@ -183,6 +183,11 @@ void MaintenanceManager::Shutdown() {
   if (monitor_thread_.get()) {
     CHECK_OK(ThreadJoiner(monitor_thread_.get()).Join());
     monitor_thread_.reset();
+    // Wait for all the running and queued tasks before shutting down. Otherwise,
+    // Shutdown() can remove a queued task silently. We count on eventually running the
+    // queued tasks to decrement their "running" count, which is incremented at the time
+    // they are enqueued.
+    thread_pool_->Wait();
     thread_pool_->Shutdown();
   }
 }