You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "David Knupp (Code Review)" <ge...@cloudera.org> on 2016/10/31 03:33:33 UTC

[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster

David Knupp has uploaded a new patch set (#7).

Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster
......................................................................

IMPALA-4365: Enabling end-to-end tests on a remote cluster

This patch lays the groundwork for loading data and running end-to-end
tests on a remote CDH cluster. The requirements for the cluster to run
the tests are:

  - Managed by Cloudera Manager (CM)
  - GPL Extras need to be installed
  - KMS and KeyTrustee installed and available as a service
  - SERDEPROPERTIES in the Hive DB modified to accept wide tables
  - Hive warehouse dir points to /test-warehouse

The actual data loading is done via a new script, remote_data_load.py,
which takes the CM host as an argument. It can be run from a client
machine that is not a node of the cluster, but it needs to have the
Impala repo checked out and Impala built. This insures that all of the
necessary data load scripts are available, as well as setting up the
environment properly (client binaries like beeline and the hbase shell
are available, python libraries like cm_api are installed, necessary
environment variables are defined, etc.)

It should be noted that running remote_data_load.py will overwrite
any local XML config files with the configurations downloaded from
the remote cluster.

Usage: remote_data_load.py [options] <cm_host address>

Options:
  -h, --help            show this help message and exit
  --snapshot-file=SNAPSHOT_FILE
                        Path to the test-warehouse archive
  --cm-user=CM_USER     Cloudera Manager admin user
  --cm-pass=CM_PASS     Cloudera Manager admin user password
  --gateway=GATEWAY     Gateway host to upload the data from. If not
                        set, uses the CM host as gateway.
  --ssh-user=SSH_USER   System user on the remote machine with
                        passwordless SSH configured.
  --no-load             Do not try to load the snapshot
  --exploration-strategy=EXPLORATION_STRATEGY
  --test                Run end-to-end tests against cluster

Testing:

This patch is being submitted with the understanding that there are
still problems to work out with the remote data load script itself.

However, since many of the existing build scripts also had to be
modified, it is more important to make sure that no regressions were
inadvertently introduced into the existing data load process. Loading
data to a local mini-cluster was checked repeatedly while this patch
was being developed, as well as running it against the Jenkins job
that provides the test-warehouse snapshot used by the many other
Impala CI builds that run daily.

Remote data loading is working for the most part, although recent
Kudu-related changes have introduced unforeseen problems:

https://github.com/apache/incubator-impala/commit/041fa6d

In the meantime, setting KUDU_IS_SUPPORTED to false provides a
temporary workaround.

Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9
---
M bin/load-data.py
A bin/remote_data_load.py
M testdata/bin/compute-table-stats.sh
M testdata/bin/create-load-data.sh
M testdata/bin/create-table-many-blocks.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/load_nested.py
M testdata/bin/run-step.sh
M testdata/bin/setup-hdfs-env.sh
10 files changed, 754 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4769/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Harrison Sheinblatt <hs...@hotmail.com>
Gerrit-Reviewer: Martin Grund <gr...@gmail.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>