You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by gr...@apache.org on 2019/02/05 22:18:12 UTC

[kudu] branch master updated (8dc7904 -> 73cda9f)

This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git.


    from 8dc7904  Add a feature flag for deleting dead containers
     new 7263cd8  [spark] Add write duration histograms
     new fd089bf  [rebalancer] always output summary on policy violations
     new 453e008  [master] add --master_client_location_assignment_enabled flag
     new 73cda9f  [docker] Optimize kudu image size

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .dockerignore                                      |   3 +-
 docker/Dockerfile                                  |  92 +++++++++-----
 docker/README.adoc                                 |  11 +-
 docker/{bootstrap-env.sh => bootstrap-dev-env.sh}  |   0
 docker/bootstrap-runtime-env.sh                    |  73 +++++++++++
 docker/docker-build.sh                             | 141 +++++++++++++--------
 java/gradle/dependencies.gradle                    |   2 +
 java/kudu-spark/build.gradle                       |   1 +
 .../kudu/spark/kudu/HdrHistogramAccumulator.scala  | 102 +++++++++++++++
 .../org/apache/kudu/spark/kudu/KuduContext.scala   |   9 +-
 src/kudu/master/master_service.cc                  |  12 +-
 src/kudu/tools/rebalancer.cc                       |  48 +++----
 12 files changed, 375 insertions(+), 119 deletions(-)
 rename docker/{bootstrap-env.sh => bootstrap-dev-env.sh} (100%)
 create mode 100755 docker/bootstrap-runtime-env.sh
 create mode 100644 java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala


[kudu] 01/04: [spark] Add write duration histograms

Posted by gr...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit 7263cd86c372d4ca71816ab97cbbf11bacf8e456
Author: Will Berkeley <wd...@gmail.org>
AuthorDate: Wed Jan 23 10:52:51 2019 -0800

    [spark] Add write duration histograms
    
    This adds an additional accumulator metrics to KuduContext writes: write
    duration histograms. These histograms will show up on the webui and in
    the driver logs, so it's easier to track how much time is spent writing
    to Kudu in Spark stages and tasks. Log messages on the driver look like:
    
    19/01/23 11:13:34 INFO kudu.KuduContext: completed insert ops: duration histogram: 25.0%: 14ms, 25.0%: 14ms, 75.0%: 17ms, 75.0%: 17ms, 75.0%: 17ms, 100.0%: 66ms, 100.0%: 66ms
    
    The funny repeated values are an artifact of having a cluster with only
    3 executors executing 4 tasks. Log messages on executors look like
    
    19/01/23 11:13:34 INFO kudu.KuduContext: applied 69 inserts to table 'impala::default.aaa' in 14ms
    
    HdrHistograms need to be shipped between executors and the driver, so
    their (serialized) size is relevant. Spark users differ in how they
    serialize, so I didn't put much effort into estimating the serialized
    size, but based on the conservative formula in [1] the in-memory size of
    a histogram with 3 significant value digits and storing longs is 4MiB or
    so. That only happens if the histogram is storing values from 1 to the
    max trackable long value, which is Long.MAX / 2. More realistically,
    the values in the duration histogram should be at most 86400 * 1000, the
    number of milliseconds in a day, and usually much, much smaller. For
    that range of values, the max footprint is 1MiB. That should be a safe
    amount of data to ship about semi-frequently along with all the Kudu
    data (and I'm not counting potential compression as part of
    serialization).
    
    [1]: https://github.com/HdrHistogram/HdrHistogram#footprint-estimation
    
    Change-Id: I0fd4d380b08bd7d7d5c1e65b79cffb44a9b9d433
    Reviewed-on: http://gerrit.cloudera.org:8080/12261
    Reviewed-by: Grant Henke <gr...@apache.org>
    Tested-by: Will Berkeley <wd...@gmail.com>
---
 java/gradle/dependencies.gradle                    |   2 +
 java/kudu-spark/build.gradle                       |   1 +
 .../kudu/spark/kudu/HdrHistogramAccumulator.scala  | 102 +++++++++++++++++++++
 .../org/apache/kudu/spark/kudu/KuduContext.scala   |   9 +-
 4 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/java/gradle/dependencies.gradle b/java/gradle/dependencies.gradle
index 017dbf6..9e8bd42 100755
--- a/java/gradle/dependencies.gradle
+++ b/java/gradle/dependencies.gradle
@@ -40,6 +40,7 @@ versions += [
     guava          : "27.0.1-android",
     hadoop         : "3.2.0",
     hamcrest       : "1.3",
+    hdrhistogram   : "2.1.11",
     hive           : "2.3.4",
     jepsen         : "0.1.5",
     jsr305         : "3.0.2",
@@ -87,6 +88,7 @@ libs += [
     hadoopMRClientCommon : "org.apache.hadoop:hadoop-mapreduce-client-common:$versions.hadoop",
     hadoopMRClientCore   : "org.apache.hadoop:hadoop-mapreduce-client-core:$versions.hadoop",
     hamcrestCore         : "org.hamcrest:hamcrest-core:$versions.hamcrest",
+    hdrhistogram         : "org.hdrhistogram:HdrHistogram:$versions.hdrhistogram",
     hiveMetastore        : "org.apache.hive:hive-metastore:$versions.hive",
     hiveMetastoreTest    : "org.apache.hive:hive-metastore:$versions.hive:tests",
     jepsen               : "jepsen:jepsen:$versions.jepsen",
diff --git a/java/kudu-spark/build.gradle b/java/kudu-spark/build.gradle
index 6a7d205..25d208b 100644
--- a/java/kudu-spark/build.gradle
+++ b/java/kudu-spark/build.gradle
@@ -20,6 +20,7 @@ apply from: "$rootDir/gradle/shadow.gradle"
 
 dependencies {
   compile project(path: ":kudu-client", configuration: "shadow")
+  compile libs.hdrhistogram
   // TODO(KUDU-2500): Spark uses reflection which requires the annotations at runtime.
   compile libs.yetusAnnotations
 
diff --git a/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala b/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala
new file mode 100644
index 0000000..c68b30d
--- /dev/null
+++ b/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kudu.spark.kudu
+
+import scala.collection.JavaConversions._
+
+import org.apache.spark.util.AccumulatorV2
+import org.HdrHistogram.HistogramIterationValue
+import org.HdrHistogram.SynchronizedHistogram
+
+/*
+ * A Spark accumulator that aggregates values into an HDR histogram.
+ *
+ * This class is a wrapper for a wrapper around an HdrHistogram[1]. The purpose
+ * of the double-wrapping is to work around how Spark displays accumulators in
+ * its web UI. Accumulators are displayed using AccumulatorV2#value's toString
+ * and not the toString method of the AccumulatorV2 (see [2]). So, to provide
+ * a useful display for the histogram on the web UI, we wrap the HdrHistogram
+ * in a wrapper class, implement toString on the wrapper class, and make the
+ * wrapper class the value class of the Accumulator.
+ *
+ * [1]: https://github.com/HdrHistogram/HdrHistogram
+ * [2]: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L216
+ */
+private[kudu] class HdrHistogramAccumulator(val histogram: HistogramWrapper)
+    extends AccumulatorV2[Long, HistogramWrapper] {
+
+  def this() = this(new HistogramWrapper(new SynchronizedHistogram(3)))
+
+  override def isZero: Boolean = {
+    histogram.inner_histogram.getTotalCount == 0
+  }
+
+  override def copy(): AccumulatorV2[Long, HistogramWrapper] = {
+    new HdrHistogramAccumulator(histogram.copy())
+  }
+
+  override def reset(): Unit = {
+    histogram.inner_histogram.reset()
+  }
+
+  override def add(v: Long): Unit = {
+    histogram.inner_histogram.recordValue(v)
+  }
+
+  override def merge(other: AccumulatorV2[Long, HistogramWrapper]): Unit = {
+    histogram.inner_histogram.add(other.value.inner_histogram)
+  }
+
+  override def value: HistogramWrapper = histogram
+
+  override def toString: String = histogram.toString
+}
+
+/*
+ * A wrapper for a SychronizedHistogram from the HdrHistogram library. See the
+ * comment on the declaration of the HdrHistogramAccumulator for why this class
+ * exists.
+ *
+ * A synchronized histogram is used because accumulators may be read from
+ * multiple threads concurrently.
+ */
+private[kudu] class HistogramWrapper(val inner_histogram: SynchronizedHistogram)
+    extends Serializable {
+
+  def copy(): HistogramWrapper = {
+    new HistogramWrapper(inner_histogram.copy())
+  }
+
+  override def toString: String = {
+    inner_histogram.synchronized {
+      if (inner_histogram.getTotalCount == 1) {
+        return s"${inner_histogram.getMinValue}ms"
+      }
+      // The argument to SynchronizedHistogram#percentiles is the number of
+      // ticks per half distance to 100%. So, a value of 1 produces values for
+      // the percentiles 50, 75, 87.5, ~95, ~97.5, etc., until all histogram
+      // values have been exhausted. It's a little wonky if there are very few
+      // values in the histogram-- it might print out the same percentile a
+      // couple of times- but it's really nice for larger histograms.
+      val pvs = for (pv: HistogramIterationValue <- inner_histogram.percentiles(1)) yield {
+        s"${pv.getPercentile}%: ${pv.getValueIteratedTo}ms"
+      }
+      pvs.mkString(", ")
+    }
+  }
+}
diff --git a/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala b/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
index a8716f3..c12d059 100644
--- a/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
+++ b/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
@@ -122,6 +122,9 @@ class KuduContext(val kuduMaster: String, sc: SparkContext, val socketReadTimeou
   val timestampAccumulator = new TimestampAccumulator()
   sc.register(timestampAccumulator)
 
+  val durationHistogram = new HdrHistogramAccumulator()
+  sc.register(durationHistogram, "kudu.write_duration")
+
   @Deprecated()
   def this(kuduMaster: String) {
     this(kuduMaster, new SparkContext())
@@ -342,6 +345,7 @@ class KuduContext(val kuduMaster: String, sc: SparkContext, val socketReadTimeou
           s"failed to write $errorCount rows from DataFrame to Kudu; sample errors: $errors")
       }
     })
+    log.info(s"completed $operation ops: duration histogram: $durationHistogram")
   }
 
   private def writePartitionRows(
@@ -365,6 +369,7 @@ class KuduContext(val kuduMaster: String, sc: SparkContext, val socketReadTimeou
     val typeConverter = CatalystTypeConverters.createToScalaConverter(schema)
     var numRows = 0
     log.info(s"applying operations of type '${opType.toString}' to table '$tableName'")
+    val startTime = System.currentTimeMillis()
     try {
       for (internalRow <- rows) {
         val row = typeConverter(internalRow).asInstanceOf[Row]
@@ -417,7 +422,9 @@ class KuduContext(val kuduMaster: String, sc: SparkContext, val socketReadTimeou
       // timestamp on each executor.
       timestampAccumulator.add(syncClient.getLastPropagatedTimestamp)
       addForOperation(numRows, opType)
-      log.info(s"applied $numRows operations of type '${opType.toString()}' to table '$tableName'")
+      val elapsedTime = System.currentTimeMillis() - startTime
+      durationHistogram.add(elapsedTime)
+      log.info(s"applied $numRows ${opType}s to table '$tableName' in ${elapsedTime}ms")
     }
     session.getPendingErrors
   }


[kudu] 03/04: [master] add --master_client_location_assignment_enabled flag

Posted by gr...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit 453e008e6a57f631d8660101630b4dda60129d65
Author: Alexey Serbin <al...@apache.org>
AuthorDate: Fri Feb 1 16:04:54 2019 -0800

    [master] add --master_client_location_assignment_enabled flag
    
    Added a test-only runtime flag --client_location_assignment_enabled
    which is useful for test scenarios where it's necessary to assign
    locations to tablet servers with assign-location.py but using
    the --relaxed option is not an option.
    
    Change-Id: Ie9631b5638b36d71e8a5a24144e857ffeeafeac5
    Reviewed-on: http://gerrit.cloudera.org:8080/12337
    Tested-by: Kudu Jenkins
    Reviewed-by: Will Berkeley <wd...@gmail.com>
---
 src/kudu/master/master_service.cc | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/kudu/master/master_service.cc b/src/kudu/master/master_service.cc
index b87e681..4a29a61 100644
--- a/src/kudu/master/master_service.cc
+++ b/src/kudu/master/master_service.cc
@@ -77,6 +77,15 @@ DEFINE_bool(master_non_leader_masters_propagate_tsk, false,
             "tests scenarios only and should not be used elsewhere.");
 TAG_FLAG(master_non_leader_masters_propagate_tsk, hidden);
 
+DEFINE_bool(master_client_location_assignment_enabled, true,
+            "Whether masters assign locations to connecting clients. "
+            "By default they do if the location assignment command is set, "
+            "but in some test scenarios it's useful to make masters assign "
+            "locations only to tablet servers, but not clients.");
+TAG_FLAG(master_client_location_assignment_enabled, hidden);
+TAG_FLAG(master_client_location_assignment_enabled, runtime);
+TAG_FLAG(master_client_location_assignment_enabled, unsafe);
+
 using google::protobuf::Message;
 using kudu::consensus::ReplicaManagementInfoPB;
 using kudu::pb_util::SecureDebugString;
@@ -524,7 +533,8 @@ void MasterServiceImpl::ConnectToMaster(const ConnectToMasterRequestPB* /*req*/,
   }
 
   // Assign a location to the client if needed.
-  if (!FLAGS_location_mapping_cmd.empty()) {
+  if (!FLAGS_location_mapping_cmd.empty() &&
+      PREDICT_TRUE(FLAGS_master_client_location_assignment_enabled)) {
     string location;
     Status s = GetLocationFromLocationMappingCmd(FLAGS_location_mapping_cmd,
                                                  rpc->remote_address().host(),


[kudu] 02/04: [rebalancer] always output summary on policy violations

Posted by gr...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit fd089bf259d3e17f2801ee3122c6fe7844efd4ae
Author: Alexey Serbin <al...@apache.org>
AuthorDate: Mon Feb 4 15:43:26 2019 -0800

    [rebalancer] always output summary on policy violations
    
    This patch makes the location-aware rebalancer always output summary
    on the detected violations of the replica placement policy.  The
    detailed information on the violations is output if the runtime
    flag --output_replica_distribution_details is specified.
    
    Prior to this patch, the rebalancer would output either the summary
    or the detailed information on the detected violations of the replica
    placement policy, depending on the presence of the
    --output_replica_distribution_details flag.
    
    Change-Id: I5f68cebcef9a3c02203e80414082a0994a1721ee
    Reviewed-on: http://gerrit.cloudera.org:8080/12358
    Tested-by: Kudu Jenkins
    Reviewed-by: Will Berkeley <wd...@gmail.com>
---
 src/kudu/tools/rebalancer.cc | 48 +++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/src/kudu/tools/rebalancer.cc b/src/kudu/tools/rebalancer.cc
index 51e1869..72b81c5 100644
--- a/src/kudu/tools/rebalancer.cc
+++ b/src/kudu/tools/rebalancer.cc
@@ -632,7 +632,31 @@ Status Rebalancer::PrintPolicyViolationInfo(const ClusterRawInfo& raw_info,
     return Status::OK();
   }
 
+  DataTable summary({ "Location",
+                      "Number of non-complying tables",
+                      "Number of non-complying tablets" });
+  typedef pair<unordered_set<string>, unordered_set<string>> TableTabletIds;
+  // Location --> sets of identifiers of tables and tablets hosted by the
+  // tablet servers at the location. The summary is sorted by location.
+  map<string, TableTabletIds> info_by_location;
+  for (const auto& info : ppvi) {
+    const auto& table_id = FindOrDie(placement_info.tablet_to_table_id,
+                                     info.tablet_id);
+    auto& elem = LookupOrEmplace(&info_by_location,
+                                 info.majority_location, TableTabletIds());
+    elem.first.emplace(table_id);
+    elem.second.emplace(info.tablet_id);
+  }
+  for (const auto& elem : info_by_location) {
+    summary.AddRow({ elem.first,
+                     to_string(elem.second.first.size()),
+                     to_string(elem.second.second.size()) });
+  }
+  RETURN_NOT_OK(summary.PrintTo(out));
+  out << endl;
+  // If requested, print details on detected policy violations.
   if (config_.output_replica_distribution_details) {
+    out << "Placement policy violation details:" << endl;
     DataTable stats(
         { "Location", "Table Name", "Tablet", "RF", "Replicas at location" });
     for (const auto& info : ppvi) {
@@ -646,30 +670,8 @@ Status Rebalancer::PrintPolicyViolationInfo(const ClusterRawInfo& raw_info,
                      to_string(info.replicas_num_at_majority_location) });
     }
     RETURN_NOT_OK(stats.PrintTo(out));
-  } else {
-    DataTable summary({ "Location",
-                        "Number of non-complying tables",
-                        "Number of non-complying tablets" });
-    typedef pair<unordered_set<string>, unordered_set<string>> TableTabletIds;
-    // Location --> sets of identifiers of tables and tablets hosted by the
-    // tablet servers at the location. The summary is sorted by location.
-    map<string, TableTabletIds> info_by_location;
-    for (const auto& info : ppvi) {
-      const auto& table_id = FindOrDie(placement_info.tablet_to_table_id,
-                                       info.tablet_id);
-      auto& elem = LookupOrEmplace(&info_by_location,
-                                   info.majority_location, TableTabletIds());
-      elem.first.emplace(table_id);
-      elem.second.emplace(info.tablet_id);
-    }
-    for (const auto& elem : info_by_location) {
-      summary.AddRow({ elem.first,
-                       to_string(elem.second.first.size()),
-                       to_string(elem.second.second.size()) });
-    }
-    RETURN_NOT_OK(summary.PrintTo(out));
+    out << endl;
   }
-  out << endl;
 
   return Status::OK();
 }


[kudu] 04/04: [docker] Optimize kudu image size

Posted by gr...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit 73cda9f611246902b982708ffa0baeed63c220b2
Author: Grant Henke <gr...@apache.org>
AuthorDate: Tue Feb 5 13:20:58 2019 -0600

    [docker] Optimize kudu image size
    
    This patch optimizes the docker kudu image size.
    
    A runtime base image was created that installs only the
    minimal runtime packages. Additionally the kudu binaries
    were stripped to reduce their size. The Java and Python
    libraries were also removed from the kudu image. A client
    image can be created specifically for python/java as
    needed and as a result those images can be much smaller
    too.
    
    This change reduces the kudu image from ~2GiB to
    ~325MiB. The base OS (xenial) is ~44MiB. The new
    runtime base image is ~174MiB. The Kudu binaries
    are ~148MB. In the future we could reduce the
    binary size further by allowing the kudu master and
    tserver to be started from the kudu binary.
    
    Additionally the build script is updated to build/tag only
    the kudu image by default and take a few more arguments.
    These changes make the builds cleaner and reduce the
    disk space used by tagged builds.
    
    Change-Id: I2a1fdad54744a1680d362f59433e94749f35469f
    Reviewed-on: http://gerrit.cloudera.org:8080/12371
    Reviewed-by: Andrew Wong <aw...@cloudera.com>
    Tested-by: Kudu Jenkins
---
 .dockerignore                                     |   3 +-
 docker/Dockerfile                                 |  92 ++++++++------
 docker/README.adoc                                |  11 +-
 docker/{bootstrap-env.sh => bootstrap-dev-env.sh} |   0
 docker/bootstrap-runtime-env.sh                   |  73 +++++++++++
 docker/docker-build.sh                            | 141 +++++++++++++---------
 6 files changed, 226 insertions(+), 94 deletions(-)

diff --git a/.dockerignore b/.dockerignore
index 423e394..5316c91 100644
--- a/.dockerignore
+++ b/.dockerignore
@@ -24,7 +24,8 @@
 !version.txt
 
 # Docker files.
-!docker/bootstrap-env.sh
+!docker/bootstrap-dev-env.sh
+!docker/bootstrap-runtime-env.sh
 !docker/kudu-entrypoint.sh
 
 # Docs files.
diff --git a/docker/Dockerfile b/docker/Dockerfile
index 09583e3..7e99ee5 100644
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -25,17 +25,14 @@
 #   http://label-schema.org/rc1/
 
 #
-# ---- Base ----
-# Builds a base image that has all prerequisite libraries for
-# development and runtime pre-installed.
-# TODO: Consider a separate runtime-base and buildtime-base to make
-#   runtime images smaller.
+# ---- Runtime ----
+# Builds a base image that has all the runtime libraries for Kudu pre-installed.
 #
 ARG BASE_OS
-FROM $BASE_OS as base
+FROM $BASE_OS as runtime
 
-COPY ./docker/bootstrap-env.sh /
-RUN ./bootstrap-env.sh && rm bootstrap-env.sh
+COPY ./docker/bootstrap-runtime-env.sh /
+RUN ./bootstrap-runtime-env.sh && rm bootstrap-runtime-env.sh
 
 # Common label arguments.
 # VCS_REF is not specified to improve docker caching.
@@ -46,9 +43,39 @@ ARG VCS_TYPE
 ARG VCS_URL
 ARG VERSION
 
-LABEL org.label-schema.name="Apache Kudu Base" \
-      org.label-schema.description="A base image that has all prerequisite \
-        libraries for development and runtime pre-installed." \
+LABEL org.label-schema.name="Apache Kudu Runtime Base" \
+      org.label-schema.description="A base image that has all the runtime \
+        libraries for Kudu pre-installed." \
+      # Common labels.
+      org.label-schema.dockerfile=$DOCKERFILE \
+      org.label-schema.maintainer=$MAINTAINER \
+      org.label-schema.url=$URL \
+      org.label-schema.vcs-type=$VCS_TYPE \
+      org.label-schema.vcs-url=$VCS_URL \
+      org.label-schema.version=$VERSION
+
+#
+# ---- Dev ----
+# Builds a base image that has all the development libraries for Kudu pre-installed.
+#
+ARG BASE_OS
+FROM $BASE_OS as dev
+
+COPY ./docker/bootstrap-dev-env.sh /
+RUN ./bootstrap-dev-env.sh && rm bootstrap-dev-env.sh
+
+# Common label arguments.
+# VCS_REF is not specified to improve docker caching.
+ARG DOCKERFILE
+ARG MAINTAINER
+ARG URL
+ARG VCS_TYPE
+ARG VCS_URL
+ARG VERSION
+
+LABEL org.label-schema.name="Apache Kudu Development Base" \
+      org.label-schema.description="A base image that has all the development \
+        libraries for Kudu pre-installed." \
       # Common labels.
       org.label-schema.dockerfile=$DOCKERFILE \
       org.label-schema.maintainer=$MAINTAINER \
@@ -63,7 +90,7 @@ LABEL org.label-schema.name="Apache Kudu Base" \
 # This is done in its own stage so that docker can cache it and only
 # run it when thirdparty has changes.
 #
-FROM base AS thirdparty
+FROM dev AS thirdparty
 
 WORKDIR /kudu
 # We only copy the needed files for thirdparty so docker can handle caching.
@@ -111,9 +138,15 @@ LABEL name="Apache Kudu Thirdparty" \
 #
 FROM thirdparty AS build
 
-# TODO: Support other buildtypes.
 ARG BUILD_TYPE=release
+ARG LINK_TYPE=static
+ARG STRIP=1
 ARG PARALLEL=4
+# This is a common label argument, but also used in the build invocation.
+ARG VCS_REF
+
+# Use the bash shell for all RUN commands.
+SHELL ["/bin/bash", "-c"]
 
 WORKDIR /kudu
 # Copy the C++ build source.
@@ -132,12 +165,19 @@ ENV NO_REBUILD_THIRDPARTY=1
 RUN ../../build-support/enable_devtoolset.sh \
   ../../thirdparty/installed/common/bin/cmake \
   -DCMAKE_BUILD_TYPE=$BUILD_TYPE \
+  -DKUDU_LINK=$LINK_TYPE \
+  -DKUDU_GIT_HASH=$VCS_REF \
   # The release build is massive with tests built.
   -DNO_TESTS=1 \
   ../.. \
   && make -j${PARALLEL} \
   # Install the client libraries for the python build to use.
-  && make install
+  # TODO: Use custom install location when the python build can be configured to use it.
+  && make install \
+  # Strip the binaries to reduce the images size.
+  && if [ "$STRIP" == "1" ]; then find "bin" -name "kudu*" -type f -exec strip {} \;; fi \
+  # Strip the client libraries to reduce the images size
+  && if [[ "$STRIP" == "1" ]]; then find "/usr/local" -name "libkudu*" -type f -exec strip {} \;; fi
 
 # Copy the java build source.
 COPY ./java /kudu/java
@@ -159,7 +199,6 @@ COPY . /kudu
 ARG DOCKERFILE
 ARG MAINTAINER
 ARG URL
-ARG VCS_REF
 ARG VCS_TYPE
 ARG VCS_URL
 ARG VERSION
@@ -167,6 +206,8 @@ ARG VERSION
 LABEL name="Apache Kudu Build" \
       description="An image that has the Kudu source code pre-built." \
       org.apache.kudu.build.type=$BUILD_TYPE \
+      org.apache.kudu.build.link=$LINK_TYPE \
+      org.apache.kudu.build.stripped=$STRIP \
       # Common labels.
       org.label-schema.dockerfile=$DOCKERFILE \
       org.label-schema.maintainer=$MAINTAINER \
@@ -178,9 +219,9 @@ LABEL name="Apache Kudu Build" \
 
 #
 # ---- Kudu ----
-# Builds a runtime image with the Kudu binaries and clients pre-installed.
+# Builds a runtime image with the Kudu binaries pre-installed.
 #
-FROM base AS kudu
+FROM runtime AS kudu
 
 ARG UID=1000
 ARG GID=1000
@@ -197,24 +238,7 @@ COPY --from=build \
 # Add to the binaries to the path.
 ENV PATH=$INSTALL_DIR/bin/:$PATH
 
-# Copy the python files and install.
-WORKDIR $INSTALL_DIR/python
-COPY --from=build /usr/local /usr/local/
-ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64
-COPY --from=build /kudu/python/dist/kudu-python-*.tar.gz .
-RUN pip install kudu-python-*.tar.gz
-
-# Copy the Java application jars.
-WORKDIR $INSTALL_DIR/java
-COPY --from=build \
- /kudu/java/kudu-backup/build/libs/*.jar \
- /kudu/java/kudu-client-tools/build/libs/*.jar \
- /kudu/java/kudu-spark-tools/build/libs/*.jar \
- ./
-
 WORKDIR $INSTALL_DIR
-# Copy the lib files.
-COPY --from=build /kudu/build/latest/lib ./lib
 # Copy the web files.
 COPY --from=build /kudu/www ./www
 COPY ./docker/kudu-entrypoint.sh /
diff --git a/docker/README.adoc b/docker/README.adoc
index 2a2a6fc..1e9076e 100644
--- a/docker/README.adoc
+++ b/docker/README.adoc
@@ -91,6 +91,7 @@ A runtime image with the Kudu binaries and clients pre-installed
 and an entrypoint script that enables easily starting Kudu
 masters and tservers along with executing other commands.
 Copies the built artifacts and files from the kudu:build image.
+Uses the kudu:runtime image as a base.
 
 === apache/kudu:build-[OS]-[VERSION]
 An image that has the Kudu source code pre-built.
@@ -98,11 +99,13 @@ Uses the kudu:thirdparty image as a base.
 
 === apache/kudu:thirdparty-[OS]-[VERSION]
 An image that has Kudu's thirdparty dependencies built.
-Uses the kudu:base image as a base.
+Uses the kudu:dev image as a base.
 
-=== apache/kudu:base-[OS]-[VERSION]
-A base image that has all prerequisite libraries for development and runtime
-pre-installed.
+=== apache/kudu:dev-[OS]-[VERSION]
+A base image that has all the development libraries for Kudu pre-installed.
+
+=== apache/kudu:runtime-[OS]-[VERSION]
+A base image that has all the runtime libraries for Kudu pre-installed.
 
 == Tips and Troubleshooting
 
diff --git a/docker/bootstrap-env.sh b/docker/bootstrap-dev-env.sh
similarity index 100%
rename from docker/bootstrap-env.sh
rename to docker/bootstrap-dev-env.sh
diff --git a/docker/bootstrap-runtime-env.sh b/docker/bootstrap-runtime-env.sh
new file mode 100755
index 0000000..9188d8d
--- /dev/null
+++ b/docker/bootstrap-runtime-env.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+##########################################################
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+# This script handles bootstrapping a base OS for
+# the Apache Kudu base docker images.
+#
+##########################################################
+
+set -xe
+
+# Install the required runtime packages, if they are not installed.
+# CentOS/RHEL
+if [[ -f "/usr/bin/yum" ]]; then
+  # Update the repo.
+  yum update -y
+
+  # Install runtime packages.
+  yum install -y \
+    cyrus-sasl-gssapi \
+    cyrus-sasl-plain \
+    krb5-server \
+    krb5-workstation \
+    nscd \
+    openssl
+
+  # Reduce the image size by cleaning up after the install.
+  yum clean all
+  rm -rf /var/cache/yum /tmp/* /var/tmp/*
+# Ubuntu/Debian
+elif [[ -f "/usr/bin/apt-get" ]]; then
+  # Ensure the Debian frontend is noninteractive.
+  export DEBIAN_FRONTEND=noninteractive
+
+  # Update the repo.
+  apt-get update -y
+
+  # Install runtime packages.
+  # --no-install-recommends keeps the install smaller
+  apt-get install -y --no-install-recommends \
+    krb5-admin-server \
+    krb5-kdc \
+    krb5-user \
+    libsasl2-modules \
+    libsasl2-modules-gssapi-mit \
+    nscd \
+    openssl
+
+  # Reduce the image size by cleaning up after the install.
+  apt-get clean
+  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
+
+  unset DEBIAN_FRONTEND
+else
+  echo "Unsupported OS"
+  exit 1
+fi
\ No newline at end of file
diff --git a/docker/docker-build.sh b/docker/docker-build.sh
index e90da5d..eb168e8 100755
--- a/docker/docker-build.sh
+++ b/docker/docker-build.sh
@@ -49,16 +49,31 @@
 # will be generated when the default operating system is used.
 #
 # Environment variables may be used to customize operation:
-#   BASE_OS: Default: ubuntu:xenial
-#     The base image to use.
 #
-#   REPOSITORY: Default: apache/kudu
+#   BASES: Default: "ubuntu:xenial"
+#     A csv string with the list of base operating systems to build with.
+#
+#   TARGETS: Default: "kudu"
+#     A csv string with the list of targets to build and tag.
+#     These targets are defined in the Dockerfile.
+#     Dependent targets of a passed image will be build, but not
+#     tagged. Note that if a target is not tagged it is subject
+#     removal by dockers system and image pruning.
+#
+#   REPOSITORY: Default: "apache/kudu"
 #     The repository string to use when tagging the image.
 #
-#   TAG_LATEST: Default: 1
+#   TAG_LATEST: Default: "1"
 #     If set to 1, adds a tag using `-latest` along with the
 #     versioned tag.
 #
+#   TAG_HASH: Default: "0"
+#     If set to 1, keeps the tags using the short git hash as
+#     the version for non-release builds. Leaving this as 0
+#     ensures the tags containing the short git hash are removed
+#     which keeps the `docker images` list cleaner when only the
+#     latest image is relevant.
+#
 #   DOCKER_CACHE_FROM:
 #      Optional images passed to the `docker build` commands
 #      via the `--cache-from` option. This option tells docker
@@ -76,40 +91,17 @@ ROOT=$(cd $(dirname "$BASH_SOURCE")/.. ; pwd)
 #   ubuntu:trusty
 #   ubuntu:xenial
 #   ubuntu:bionic
-DEFAULTS_OS="ubuntu:xenial"
-BASE_OS=${BASE_OS:="$DEFAULTS_OS"}
+DEFAULT_OS="ubuntu:xenial"
+BASES=${BASES:="$DEFAULT_OS"}
+TARGETS=${TARGETS:="kudu"}
 REPOSITORY=${REPOSITORY:="apache/kudu"}
 TAG_LATEST=${TAG_LATEST:=1}
+TAG_HASH=${TAG_HASH:=0}
 DOCKER_CACHE_FROM=${DOCKER_CACHE_FROM:=""}
-TARGETS=("base" "thirdparty" "build" "kudu")
 
-VERSION=$(cat $ROOT/version.txt)
+VERSION=$(cat "$ROOT/version.txt")
 VCS_REF=$(git rev-parse --short HEAD)
 
-BUILD_ARGS=(
-  --build-arg BASE_OS="$BASE_OS"
-  --build-arg DOCKERFILE="docker/Dockerfile"
-  --build-arg MAINTAINER="Apache Kudu <de...@kudu.apache.org>"
-  --build-arg URL="https://kudu.apache.org"
-  --build-arg VERSION=$VERSION
-  --build-arg VCS_REF=$VCS_REF
-  --build-arg VCS_TYPE="git"
-  --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"
-)
-
-if [[ -n "$DOCKER_CACHE_FROM" ]]; then
-  BUILD_ARGS+=(--cache-from "$DOCKER_CACHE_FROM")
-fi
-
-# Create the OS_TAG.
-OS_NAME=$(echo "$BASE_OS" | cut -d':' -f1)
-OS_VERSION=$(echo "$BASE_OS" | cut -d':' -f2)
-if [[ "$OS_VERSION" == [[:digit:]]* ]]; then
-  OS_TAG="$OS_NAME$OS_VERSION"
-else
-  OS_TAG="$OS_VERSION"
-fi
-
 # Create the VERSION_TAG.
 if [[ "$VERSION" == *-SNAPSHOT ]]; then
   IS_RELEASE_VERSION=0
@@ -120,6 +112,21 @@ else
   MINOR_VERSION_TAG=$(echo "$VERSION" | sed "s/.[^.]*$//")
 fi
 
+# Constructs an OS tag based on the passed BASE_IMAGE.
+# The operating system is described with the version name.
+# If the operating system version is numeric, the version
+# will also be appended.
+function get_os_tag() {
+  local BASE_IMAGE=$1
+  local OS_NAME=$(echo "$BASE_IMAGE" | cut -d':' -f1)
+  local OS_VERSION=$(echo "$BASE_IMAGE" | cut -d':' -f2)
+  if [[ "$OS_VERSION" == [[:digit:]]* ]]; then
+    echo "$OS_NAME$OS_VERSION"
+  else
+    echo "$OS_VERSION"
+  fi
+}
+
 # Constructs a tag, excluding the OS_TAG if it is empty.
 # Additionally ignores the target when it is the default target "kudu".
 # Examples:
@@ -144,33 +151,57 @@ function get_tag() {
   echo "$REPOSITORY:$TAG"
 }
 
-for TARGET in "${TARGETS[@]}"; do
-  FULL_TAG=$(get_tag "$TARGET" "$VERSION_TAG" "$OS_TAG")
+for BASE_OS in $(echo "$BASES" | tr ',' '\n'); do
+  # Generate the arguments to pass to the docker build.
+  BUILD_ARGS=(
+    --build-arg BASE_OS="$BASE_OS"
+    --build-arg DOCKERFILE="docker/Dockerfile"
+    --build-arg MAINTAINER="Apache Kudu <de...@kudu.apache.org>"
+    --build-arg URL="https://kudu.apache.org"
+    --build-arg VERSION="$VERSION"
+    --build-arg VCS_REF="$VCS_REF"
+    --build-arg VCS_TYPE="git"
+    --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"
+  )
+  if [[ -n "$DOCKER_CACHE_FROM" ]]; then
+    BUILD_ARGS+=(--cache-from "$DOCKER_CACHE_FROM")
+  fi
+  OS_TAG=$(get_os_tag "$BASE_OS")
 
-  # Build the target and tag with the full tag.
-  docker build "${BUILD_ARGS[@]}" -f $ROOT/docker/Dockerfile \
-    --target "$TARGET" -t "$FULL_TAG" ${ROOT}
+  for TARGET in $(echo "$TARGETS" | tr ',' '\n'); do
+    FULL_TAG=$(get_tag "$TARGET" "$VERSION_TAG" "$OS_TAG")
 
-  # If this is the default OS, also tag it without the OS-specific tag.
-  if [[ "$BASE_OS" == "$DEFAULTS_OS" ]]; then
-    docker tag "$FULL_TAG" "$(get_tag "$TARGET" "$VERSION_TAG" "")"
-  fi
+    # Build the target and tag with the full tag.
+    docker build "${BUILD_ARGS[@]}" -f "$ROOT/docker/Dockerfile" \
+      --target "$TARGET" -t "$FULL_TAG" ${ROOT}
 
-  # Add the minor version tag if this is a release version.
-  if [[ "$IS_RELEASE_VERSION" == "1" ]]; then
-    docker tag "$FULL_TAG" "$(get_tag "$TARGET" "$MINOR_VERSION_TAG" "$OS_TAG")"
-    # Add the default OS tag.
-    if [[ "$BASE_OS" == "$DEFAULTS_OS" ]]; then
-      docker tag "$FULL_TAG" "$(get_tag "$TARGET" "$MINOR_VERSION_TAG" "")"
+    # If this is the default OS, also tag it without the OS-specific tag.
+    if [[ "$BASE_OS" == "$DEFAULT_OS" ]]; then
+      docker tag "$FULL_TAG" "$(get_tag "$TARGET" "$VERSION_TAG" "")"
     fi
-  fi
 
-  # Add the latest version tags.
-  if [[ "$TAG_LATEST" == "1" ]]; then
-    docker tag "$FULL_TAG" "$(get_tag "$TARGET" "latest" "$OS_TAG")"
-    # Add the default OS tag.
-    if [[ "$BASE_OS" == "$DEFAULTS_OS" ]]; then
-      docker tag "$FULL_TAG" "$(get_tag "$TARGET" "latest" "")"
+    # Add the minor version tag if this is a release version.
+    if [[ "$IS_RELEASE_VERSION" == "1" ]]; then
+      docker tag "$FULL_TAG" "$(get_tag "$TARGET" "$MINOR_VERSION_TAG" "$OS_TAG")"
+      # Add the default OS tag.
+      if [[ "$BASE_OS" == "$DEFAULT_OS" ]]; then
+        docker tag "$FULL_TAG" "$(get_tag "$TARGET" "$MINOR_VERSION_TAG" "")"
+      fi
     fi
-  fi
+
+    # Add the latest version tags.
+    if [[ "$TAG_LATEST" == "1" ]]; then
+      docker tag "$FULL_TAG" "$(get_tag "$TARGET" "latest" "$OS_TAG")"
+      # Add the default OS tag.
+      if [[ "$BASE_OS" == "$DEFAULT_OS" ]]; then
+        docker tag "$FULL_TAG" "$(get_tag "$TARGET" "latest" "")"
+      fi
+    fi
+
+    # Remove the hash tags if the aren't wanted.
+    if [[ "$TAG_HASH" != "1" && "$IS_RELEASE_VERSION" ]]; then
+      HASH_TAG_PATTERN="$REPOSITORY:*$VCS_REF*"
+      docker rmi $(docker images -q "$HASH_TAG_PATTERN" --format "{{.Repository}}:{{.Tag}}")
+    fi
+  done
 done