You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by al...@apache.org on 2018/09/10 20:37:39 UTC

[1/2] kudu git commit: [docs] Add "one client only" best practice for kudu-spark

Repository: kudu
Updated Branches:
  refs/heads/master b552d9118 -> 953a09b82


[docs] Add "one client only" best practice for kudu-spark

Change-Id: Ibaf369315b8627674ba64e6418d153568ded6fe8
Reviewed-on: http://gerrit.cloudera.org:8080/11409
Tested-by: Will Berkeley <wd...@gmail.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/e3570519
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/e3570519
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/e3570519

Branch: refs/heads/master
Commit: e3570519b200a0ffbd713798bc8aabd6f36ed3b7
Parents: b552d91
Author: Will Berkeley <wd...@gmail.org>
Authored: Mon Sep 10 10:45:30 2018 -0700
Committer: Will Berkeley <wd...@gmail.com>
Committed: Mon Sep 10 18:43:43 2018 +0000

----------------------------------------------------------------------
 docs/developing.adoc | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/e3570519/docs/developing.adoc
----------------------------------------------------------------------
diff --git a/docs/developing.adoc b/docs/developing.adoc
index 98db2ba..49d8c7e 100644
--- a/docs/developing.adoc
+++ b/docs/developing.adoc
@@ -217,6 +217,23 @@ mode, the submitting user must have an active Kerberos ticket granted through
 name and keytab location must be provided through the `--principal` and
 `--keytab` arguments to `spark2-submit`.
 
+=== Spark Integration Best Practices
+
+==== Avoid multiple Kudu clients per cluster.
+
+One common Kudu-Spark coding error is instantiating extra `KuduClient` objects.
+In kudu-spark, a `KuduClient` is owned by the `KuduContext`. Spark application code
+should not create another `KuduClient` connecting to the same cluster. Instead,
+application code should use the `KuduContext` to access a `KuduClient` using
+`KuduContext#syncClient`.
+
+To diagnose multiple `KuduClient` instances in a Spark job, look for signs in
+the logs of the master being overloaded by many `GetTableLocations` or
+`GetTabletLocations` requests coming from different clients, usually around the
+same time. This symptom is especially likely in Spark Streaming code,
+where creating a `KuduClient` per task will result in periodic waves of master
+requests from new clients.
+
 === Spark Integration Known Issues and Limitations
 
 - Spark 2.2+ requires Java 8 at runtime even though Kudu Spark 2.x integration


[2/2] kudu git commit: [catalog_manager] updated warning message

Posted by al...@apache.org.
[catalog_manager] updated warning message

Updated the warning message logged upon a failure to allocate
an extra replica for a tablet: don't call the new replica a replacement
because AsyncAddReplicaTask is used not only in re-replication scenarios,
but in replica movement scenarios as well.

Also, do not mention unsetting --raft_prepare_replacement_before_eviction
since the 3-2-3 replica management scheme is deprecated at this point
and is no better in the case of a whole tablet server failure.

This is a follow-up to 1fcce42200d22597e7e69baa7232b4de93d5e2a3.

There are no functional changes in this patch.

Change-Id: Ifb6905dc1870acd34553187594008ba34781ce6d
Reviewed-on: http://gerrit.cloudera.org:8080/11402
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/953a09b8
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/953a09b8
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/953a09b8

Branch: refs/heads/master
Commit: 953a09b826b4a0ccd379480f7aac441186f8bacb
Parents: e357051
Author: Alexey Serbin <as...@cloudera.com>
Authored: Fri Sep 7 16:35:33 2018 -0700
Committer: Alexey Serbin <as...@cloudera.com>
Committed: Mon Sep 10 20:35:04 2018 +0000

----------------------------------------------------------------------
 src/kudu/master/catalog_manager.cc | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/953a09b8/src/kudu/master/catalog_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/catalog_manager.cc b/src/kudu/master/catalog_manager.cc
index 0dd8003..27e9722 100644
--- a/src/kudu/master/catalog_manager.cc
+++ b/src/kudu/master/catalog_manager.cc
@@ -3475,14 +3475,14 @@ bool AsyncAddReplicaTask::SendRequest(int attempt) {
 
   auto replacement_replica = SelectReplica(ts_descs, excluded, rng_);
   if (PREDICT_FALSE(!replacement_replica)) {
-    auto msg = Substitute("no candidate replacement replica found for tablet $0",
+    auto msg = Substitute("no extra replica candidate found for tablet $0",
                           tablet_->ToString());
-    // Check whether it's a situation when a replacement replica cannot be found
+    // Check whether it's a situation when an extra replica cannot be found
     // due to an inconsistency in cluster configuration. If the tablet has the
-    // replication factor of N, and the cluster is configured to use N->(N+1)->N
-    // replication scheme (see --raft_prepare_replacement_before_eviction flag),
-    // at least N+1 tablet servers should be registered to find a place
-    // for a replacement replica.
+    // replication factor of N, and the cluster is using the N->(N+1)->N
+    // replica management scheme (see --raft_prepare_replacement_before_eviction
+    // flag), at least N+1 tablet servers should be registered to find a place
+    // for an extra replica.
     TSDescriptorVector all_descriptors;
     master_->ts_manager()->GetAllDescriptors(&all_descriptors);
     const auto num_tservers_registered = all_descriptors.size();
@@ -3492,21 +3492,16 @@ bool AsyncAddReplicaTask::SendRequest(int attempt) {
       TableMetadataLock l(tablet_->table().get(), LockMode::READ);
       replication_factor = tablet_->table()->metadata().state().pb.num_replicas();
     }
-    DCHECK_GE(replication_factor, 0);
+    DCHECK_GE(replication_factor, 1);
     const auto num_tservers_needed =
         FLAGS_raft_prepare_replacement_before_eviction ? replication_factor + 1
                                                        : replication_factor;
     if (num_tservers_registered < num_tservers_needed) {
       msg += Substitute(
-          "; the total number of registered tablet servers ($0) does not allow "
-          "for replacement of the failed replica: at least $1 tablet servers "
-          "are required", num_tservers_registered, num_tservers_needed);
-      if (FLAGS_raft_prepare_replacement_before_eviction &&
-          num_tservers_registered == replication_factor) {
-        msg +=
-          "; consider either adding an additional tablet server or running "
-          "the cluster with --raft_prepare_replacement_before_eviction=false";
-      }
+          ": the total number of registered tablet servers ($0) does not allow "
+          "for adding an extra replica; consider bringing up more "
+          "to have at least $1 tablet servers up and running",
+          num_tservers_registered, num_tservers_needed);
     }
     KLOG_EVERY_N_SECS(WARNING, 60) << LogPrefix() << msg;
     return false;