You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by sa...@apache.org on 2018/04/23 17:39:25 UTC

[7/9] impala git commit: IMPALA-6898: Avoid duplicate Kudu load during full dataload

IMPALA-6898: Avoid duplicate Kudu load during full dataload

testdata/bin/create-load-data.sh does bin/load-data.py for
functional/exhaustive, tpch/core, and tpcds/core in a
first phase, then it loads functional and tpch for Kudu
in a second phase. For a full dataload, this second phase
is not necessary. functional/exhaustive and tpch/core
already include Kudu.

This avoids the second phase when doing a full dataload.
The second phase is still necessary when loading from
a snapshot, and this does not change that behavior.

This saves a couple minutes off of full dataload.

Change-Id: Ic023d230f99126ed37795106c38faae5f0cb608e
Reviewed-on: http://gerrit.cloudera.org:8080/10128
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/5bc5279b
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/5bc5279b
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/5bc5279b

Branch: refs/heads/master
Commit: 5bc5279b07451f8c6fb8af29ef83127dc7785440
Parents: 4dc3d34
Author: Joe McDonnell <jo...@cloudera.com>
Authored: Thu Apr 19 16:14:03 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Sat Apr 21 01:08:50 2018 +0000

----------------------------------------------------------------------
 testdata/bin/create-load-data.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/5bc5279b/testdata/bin/create-load-data.sh
----------------------------------------------------------------------
diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh
index e50515b..51ba449 100755
--- a/testdata/bin/create-load-data.sh
+++ b/testdata/bin/create-load-data.sh
@@ -540,8 +540,10 @@ elif [ "${TARGET_FILESYSTEM}" = "hdfs" ];  then
       load-data "functional-query" "core" "hbase/none"
 fi
 
-if $KUDU_IS_SUPPORTED; then
+if [[ $SKIP_METADATA_LOAD -eq 1 && $KUDU_IS_SUPPORTED ]]; then
   # Tests depend on the kudu data being clean, so load the data from scratch.
+  # This is only necessary if this is not a full dataload, because a full dataload
+  # already loads Kudu functional and TPC-H tables from scratch.
   run-step-backgroundable "Loading Kudu functional" load-kudu.log \
         load-data "functional-query" "core" "kudu/none/none" force
   run-step-backgroundable "Loading Kudu TPCH" load-kudu-tpch.log \