You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by cs...@apache.org on 2019/06/07 12:54:12 UTC

[impala] 01/02: Fix integration of kudu-hive.jar

This is an automated email from the ASF dual-hosted git repository.

csringhofer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 90d84425292532bd0e18aaa9851cc4ca01fd4bce
Author: Thomas Tauber-Marshall <tm...@cloudera.com>
AuthorDate: Thu Jun 6 10:53:32 2019 -0700

    Fix integration of kudu-hive.jar
    
    IMPALA-8503 added downloading kudu-hive.jar and adding it to
    HADOOP_CLASSPATH in run-hive-server.sh to allow the Hive Metastore to
    start with Kudu's HMS plugin.
    
    There are two problems with this that are fixed by this patch:
    - Previously, we fully specify the expected jar filename based on the
      value of IMPALA_KUDU_JAVA_VERSION when adding it to HADOOP_CLASSPATH
      but this is overly restrictive for users who may wish to override
      this value in impala-config-branch.sh to build their own branch with
      a different version of the kudu-hive.jar This patch relaxes this
      restriction by adding any jar containing the string kudu-hive in
      IMPALA_KUDU_JAVA_HOME to HADOOP_CLASSPATH
    - In bootstrap_toolchain, we don't download a package if its directory
      already exists. Since the 'kudu' and 'kudu-java' packages download
      to the same directory, this led to a race condition where
      'kudu-java' might not be downloaded if 'kudu' had already been
      unpacked when it started. This patch fixes this by inspecting the
      contents of the Kudu package directory to look for specific files
      expected for each Kudu package.
    
    Change-Id: I4ac79c3e9b8625ba54145dba23c69fd5117f35c7
    Reviewed-on: http://gerrit.cloudera.org:8080/13542
    Reviewed-by: Thomas Marshall <tm...@cloudera.com>
    Reviewed-by: Hao Hao <ha...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 bin/bootstrap_toolchain.py      | 19 +++++++++++++++++--
 bin/impala-config.sh            |  1 +
 testdata/bin/run-hive-server.sh |  6 +++---
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/bin/bootstrap_toolchain.py b/bin/bootstrap_toolchain.py
index ae0b7d3..6be838f 100755
--- a/bin/bootstrap_toolchain.py
+++ b/bin/bootstrap_toolchain.py
@@ -40,6 +40,7 @@
 #
 #     python bootstrap_toolchain.py
 import logging
+import glob
 import multiprocessing.pool
 import os
 import random
@@ -418,10 +419,24 @@ def download_cdh_components(toolchain_root, cdh_components, url_prefix):
       component_name = component.name
       if component.name == "kudu-java":
         component_name = "kudu"
+
+      # Check if the diretory already exists, and skip downloading it if it does. Since
+      # the kudu and kudu-java tarballs unpack to the same directory, we check for files
+      # in that directory expected for each package. TODO: if we change how the Kudu
+      # tarballs are packaged we can remove this special case.
       pkg_directory = package_directory(cdh_components_home, component_name,
           component.version)
-      if os.path.isdir(pkg_directory):
-        return
+      if component.name == "kudu-java":
+        if len(glob.glob("%s/*jar" % pkg_directory)) > 0:
+          return
+      elif component.name == "kudu":
+        # Regardless of the actual build type, the 'kudu' tarball will always contain a
+        # 'debug' and a 'release' directory.
+        if os.path.exists(os.path.join(pkg_directory, "debug")):
+          return
+      else:
+        if os.path.isdir(pkg_directory):
+          return
 
       platform_label = ""
       # Kudu is the only component that's platform dependent.
diff --git a/bin/impala-config.sh b/bin/impala-config.sh
index 7851ff2..2b01c7b 100755
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -669,6 +669,7 @@ else
   export IMPALA_KUDU_VERSION=${IMPALA_KUDU_VERSION-"84086fe"}
   export IMPALA_KUDU_HOME=${IMPALA_TOOLCHAIN}/kudu-$IMPALA_KUDU_VERSION
 fi
+export IMPALA_KUDU_JAVA_HOME=${CDH_COMPONENTS_HOME}/kudu-$IMPALA_KUDU_VERSION
 
 # Set $THRIFT_HOME to the Thrift directory in toolchain.
 export THRIFT_HOME="${IMPALA_TOOLCHAIN}/thrift-${IMPALA_THRIFT_VERSION}"
diff --git a/testdata/bin/run-hive-server.sh b/testdata/bin/run-hive-server.sh
index daa7bad..e53c58f 100755
--- a/testdata/bin/run-hive-server.sh
+++ b/testdata/bin/run-hive-server.sh
@@ -102,9 +102,9 @@ fi
 
 # Add kudu-hive.jar to the Hive Metastore classpath, so that Kudu's HMS
 # plugin can be loaded.
-FILE_NAME="${CDH_COMPONENTS_HOME}/kudu-${IMPALA_KUDU_JAVA_VERSION}/\
-kudu-hive-${IMPALA_KUDU_JAVA_VERSION}.jar"
-export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${FILE_NAME}
+for file in ${IMPALA_KUDU_JAVA_HOME}/*kudu-hive*jar; do
+  export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${file}
+done
 # Default to skip validation on Kudu tables if KUDU_SKIP_HMS_PLUGIN_VALIDATION
 # is unset.
 export KUDU_SKIP_HMS_PLUGIN_VALIDATION=${KUDU_SKIP_HMS_PLUGIN_VALIDATION:-1}