You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Brown (Code Review)" <ge...@cloudera.org> on 2017/01/05 18:59:34 UTC

[Impala-ASF-CR] IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements

Michael Brown has uploaded a new patch set (#6).

Change subject: IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements
......................................................................

IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements

- Generate INSERT statements that are either INSERT ... VALUES or INSERT
  ... SELECT

- On both types of INSERTs, we either insert into all columns, or into
  some column list. If the column list exists, all primary keys will be
  present, and 0 or more additional columns will also be in the list.
  The ordering of the column list is random.

- For INSERT ... SELECT, occasionally generate a WITH clause

- For INSERT ... VALUES, generate non-null constants for the primary
  keys, but for the non-primary keys, randomly generate a value
  expression.

The type system in the random statement/query generator isn't
sophisticated enough to the implicit type of a SELECT item or a value
expression. It knows it will be some INT-based type, but not if it's
going to be a SMALLINT or a BIGINT. To get around this, the easiest
thing seems to be to explicitly cast the SELECT items or value
expressions to the columns' so-called exact_type attribute.

Much of the testing here involved running discrepancy_searcher.py
--explain-only on both tpch_kudu and a random HDFS table, using both the
default profile and DML-only profile. This was done to quickly find bugs
in the statement generation, as they tend to bubble up as analysis
errors. I expect to make other changes as follow on patches and more
random statements find small test issues.

For actual use against Kudu data, you need to migrate data from Kudu
into PostgreSQL 5 (instructions tests/comparison/POSTGRES.txt) and run
something like:

tests/comparison/discrepancy_searcher.py \
  --use-postgresql \
  --postgresql-port 5433 \
  --profile dmlonly \
  --timeout 300 \
  --db-name tpch_kudu \
  --query-count 10

Change-Id: I842b41f0eed07ab30ec76d8fc3cdd5affb525af6
---
M tests/comparison/common.py
M tests/comparison/db_connection.py
M tests/comparison/funcs.py
M tests/comparison/model_translator.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
M tests/comparison/statement_generator.py
M tests/comparison/tests/query_object_testdata.py
8 files changed, 263 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/5486/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I842b41f0eed07ab30ec76d8fc3cdd5affb525af6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>