You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2018/07/30 18:33:43 UTC

impala git commit: IMPALA-7170: Update data_generator.py for Hadoop 3

Repository: impala
Updated Branches:
  refs/heads/master b5608264b -> 10a67509f


IMPALA-7170: Update data_generator.py for Hadoop 3

After the move to Hadoop 3, data_generator.py was broken. The issue
seems to be that we rely on additional jars not in the classpath. The
solution is to pass the location of these jars into the 'hadoop'
command using the '-libjars' parameter.

This patch also updates tests/comparison/README to add instructions
for dealing with Yarn, since during the move to Hadoop 3 we switched
to no longer running Yarn as part of the minicluster by default.

Change-Id: I47b7d663174dbd38a5d9c98f1a88f0ebab726d5a
Reviewed-on: http://gerrit.cloudera.org:8080/11041
Reviewed-by: Thomas Marshall <th...@cmu.edu>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/10a67509
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/10a67509
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/10a67509

Branch: refs/heads/master
Commit: 10a67509f283e1434aaed0f5d5e03937d3b76aa9
Parents: b560826
Author: Thomas Tauber-Marshall <tm...@cloudera.com>
Authored: Tue Jul 24 19:59:50 2018 +0000
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Sun Jul 29 02:25:30 2018 +0000

----------------------------------------------------------------------
 tests/comparison/README            | 6 +++++-
 tests/comparison/data_generator.py | 4 +++-
 2 files changed, 8 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/10a67509/tests/comparison/README
----------------------------------------------------------------------
diff --git a/tests/comparison/README b/tests/comparison/README
index d3e86df..057f325 100644
--- a/tests/comparison/README
+++ b/tests/comparison/README
@@ -11,7 +11,11 @@ slower but has a larger coverage area.
 
 Requirements:
 
-1) It's assumed that Impala is running locally.
+1) It's assumed that Impala is running locally. The minicluster should either be run with
+   Yarn (by setting INCLUDE_YARN=true and running ./buildall.sh -start_minicluster), or
+   mapreduce should be configured to use local mode (by modifying mapreduce.framework.name
+   in testdata/cluster/node_templates/common/etc/hadoop/conf/mapred-site.xml to 'local'
+   and running ./buildall.sh -start_minicluster)
 
 2) Impyla -- an implementation of DB API 2 for Impala.
 

http://git-wip-us.apache.org/repos/asf/impala/blob/10a67509/tests/comparison/data_generator.py
----------------------------------------------------------------------
diff --git a/tests/comparison/data_generator.py b/tests/comparison/data_generator.py
index f1b10cd..a52ce6f 100755
--- a/tests/comparison/data_generator.py
+++ b/tests/comparison/data_generator.py
@@ -236,12 +236,14 @@ class DbPopulator(object):
     self.cluster.yarn.run_mr_job(self.cluster.yarn.find_mr_streaming_jar(), job_args=r'''
         -D mapred.reduce.tasks=%s \
         -D stream.num.map.output.key.fields=2 \
+        -libjars '%s/share/hadoop/hdfs/lib/*' \
         -files %s \
         -input %s \
         -output %s \
         -mapper data_generator_mapper.py \
         -reducer data_generator_reducer.py'''.strip()
-        % (reducer_count, ','.join(files), mapper_input_file, hdfs_output_dir))
+        % (reducer_count, os.environ["HADOOP_HOME"], ','.join(files), mapper_input_file,
+           hdfs_output_dir))
 
 
 if __name__ == '__main__':