You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org> on 2016/09/07 21:22:55 UTC

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4327

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-D data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
30 files changed, 2,472 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4327

to look at the new patch set (#8).

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
30 files changed, 2,446 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 5:

(2 comments)

It seems regardless of database, Kudu tables live in the same Kudu namespace. In order to have TPCH and TPCDS data both loaded, at least one of the customer tables must have a different name.

But also, it's often convenient to have the same data of varying scales loaded on a stress cluster for various experiments. Since we solve that problem by including the data scale in the database name, I think we could solve these problems in Kudu by including the Impala database as a prefix to each Kudu table.

Suggestion: prepend all the Kudu table names with the database name.

http://gerrit.cloudera.org:8080/#/c/4327/5/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS5, Line 469: 'kudu.table_name'='customer',
Here's one kudu table called 'customer'. Can this (and others) be expanded to "{target_db_name}_customer" ?


http://gerrit.cloudera.org:8080/#/c/4327/5/testdata/datasets/tpch/tpch_kudu_template.sql
File testdata/datasets/tpch/tpch_kudu_template.sql:

PS5, Line 185:   'kudu.table_name' = 'customer',
Here's another kudu table called "customer".


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/bin/load-tpc-kudu.py
File testdata/bin/load-tpc-kudu.py:

PS3, Line 51:     tbls_to_clean = tpch_tables if workload.lower() == 'tpch' else tpcds_tables
Maybe use the cursor to get the list of tables? That way you don't have to hardcode the table names L39-46.


PS3, Line 81:   sql_file_path = "%s/testdata/datasets/%s/%s_kudu_template.sql"
Use os.path.join() here.


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS3, Line 39: 'kudu.key_columns' = 'ss_sold_date_sk,ss_ticket_number, ss_item_sk'
For my education, I looked at http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.3.0.pdf and saw that for this table, the PK is ss_item_sk,ss_ticket_number . Can you explain why ss_sold_date_sk is added as a key column?


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
File testdata/workloads/tpcds/queries/tpcds-kudu-q19.test:

Line 39: ====
I noticed none of the TPC-DS Kudu queries have RESULTS. Why? (I searched for a TODO and didn't see a reason that might explain it; maybe I missed it.)


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
File testdata/workloads/tpcds/queries/tpcds-kudu-q47.test:

PS3, Line 33: 	,round(v1_lead.sum_sales, 2) nsum
Nit: tab character.


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
File testdata/workloads/tpcds/queries/tpcds-kudu-q65.test:

PS3, Line 55: order by
            :   s_store_name,
            :   i_item_desc,
            :   sc.revenue,
            :   i_current_price,
            :   i_wholesale_cost,
            :   i_brand
The ORDER BY has more columns than the TPC-DS-for-HDFS counterpart. Any reason?


http://gerrit.cloudera.org:8080/#/c/4327/3/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS3, Line 1463:     tpch_kudu_queries = load_tpc_queries("tpch", "kudu")
Change "kudu" to load_in_kudu=True


PS3, Line 1468:     tpcds_kudu_queries = load_tpc_queries("tpcds", "kudu")
Change "kudu" to load_in_kudu=True


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/bin/load-tpc-kudu.py
File testdata/bin/load-tpc-kudu.py:

PS3, Line 51:     tbls_to_clean = tpch_tables if workload.lower() == 'tpch' else tpcds_tables
> The change to enable drop db cascade for Kudu is in review. If you're stron
Done


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
File testdata/workloads/tpcds/queries/tpcds-kudu-q19.test:

Line 39: ====
> The TPC-DS workload is not currently enabled for Kudu in our regular (non-s
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4327/6/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS6, Line 4: ---- TODO: Fix the primary key column order
> Correct. Removed the TODO.
thanks, but should TPCDS have PKs with nulls in the first place? Basically I'm wondering if we're doing something else wrong, i.e. picking the wrong PK or maybe the data is wrong?


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4327/6/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS6, Line 4: ---- TODO: Remove the IS NOT NULL predicates when nullable keys are supported
> That sounds like a bunch of work-- If you don't think there are any nulls, 
I removed the IS NOT NULL predicates and tried a new load + run of the stress test on a new EC2 cluster with no issues. So, I removed them. Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/bin/load-tpc-kudu.py
File testdata/bin/load-tpc-kudu.py:

PS3, Line 51:     tbls_to_clean = tpch_tables if workload.lower() == 'tpch' else tpcds_tables
> Maybe use the cursor to get the list of tables? That way you don't have to 
The change to enable drop db cascade for Kudu is in review. If you're strongly against simply doing a drop db when this is in, I'll implement your proposal.


PS3, Line 81:   sql_file_path = "%s/testdata/datasets/%s/%s_kudu_template.sql"
> Use os.path.join() here.
Done


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS3, Line 39: 'kudu.key_columns' = 'ss_sold_date_sk,ss_ticket_number, ss_item_sk'
> For my education, I looked at http://www.tpc.org/tpc_documents_current_vers
No intention here, I just got this wrong :) I actually found a few other inconsistencies.


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
File testdata/workloads/tpcds/queries/tpcds-kudu-q19.test:

Line 39: ====
> I noticed none of the TPC-DS Kudu queries have RESULTS. Why? (I searched fo
The TPC-DS workload is not currently enabled for Kudu in our regular (non-stress) testing framework. It requires some extra work which I left for another patch. Once this is done, these queries will also get proper RESULTS and TYPES sections. For now it seemed kind of pointless to add these sections. Besides the stress test has its own result verification mechanism.


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
File testdata/workloads/tpcds/queries/tpcds-kudu-q47.test:

PS3, Line 33: 	,round(v1_lead.sum_sales, 2) nsum
> Nit: tab character.
Done


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
File testdata/workloads/tpcds/queries/tpcds-kudu-q65.test:

PS3, Line 55: order by
            :   s_store_name,
            :   i_item_desc,
            :   sc.revenue,
            :   i_current_price,
            :   i_wholesale_cost,
            :   i_brand
> The ORDER BY has more columns than the TPC-DS-for-HDFS counterpart. Any rea
I had trouble getting consistent results across multiple invocations of this query unless I included these additional columns here. Added a comment.


http://gerrit.cloudera.org:8080/#/c/4327/3/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS3, Line 1463:     tpch_kudu_queries = load_tpc_queries("tpch", "kudu")
> Change "kudu" to load_in_kudu=True
Good catch, thanks. Done


PS3, Line 1468:     tpcds_kudu_queries = load_tpc_queries("tpcds", "kudu")
> Change "kudu" to load_in_kudu=True
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#5).

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
30 files changed, 2,470 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 4:

(3 comments)

Thanks for calling these out mikeb.

http://gerrit.cloudera.org:8080/#/c/4327/4/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS4, Line 263: 'kudu.key_columns'='sr_returned_date_sk,sr_ticket_number, sr_item_sk',
> In TPC-DS, the primary keys are sr_item_sk, sr_ticket_number . Is this diff
Done


PS4, Line 575: 'kudu.key_columns'='hd_demo_sk,hd_income_band_sk',
> In TPC-DS only hd_demo_sk is a primary key.
Done


PS4, Line 643: 'kudu.key_columns'='p_promo_sk, p_item_sk',
> In TPC-DS, only p_promo_sk is a primary key.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 4:

(3 comments)

I took a look at the TPC-H and DS specs for more primary key inconsistencies. If the discrepancies are intentional, can you leave a note? (TPCH looked right to my eyes.)

http://gerrit.cloudera.org:8080/#/c/4327/4/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS4, Line 263: 'kudu.key_columns'='sr_returned_date_sk,sr_ticket_number, sr_item_sk',
In TPC-DS, the primary keys are sr_item_sk, sr_ticket_number . Is this difference correct and intentional?


PS4, Line 575: 'kudu.key_columns'='hd_demo_sk,hd_income_band_sk',
In TPC-DS only hd_demo_sk is a primary key.


PS4, Line 643: 'kudu.key_columns'='p_promo_sk, p_item_sk',
In TPC-DS, only p_promo_sk is a primary key.


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 9: Verified-1

Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/324/

-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 1:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4327/1//COMMIT_MSG
Commit Message:

PS1, Line 13: D
ds


Line 18: 3. Created SQL files with TPC-DS queries to be executed in Kudu.
(Comment that TPC-H already exists?)


http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/bin/load-tpc-kudu.py
File testdata/bin/load-tpc-kudu.py:

PS1, Line 50: with
IIRC this syntax breaks on py 2.4, which we shouldn't be using for these tests but i think still barfs if this file gets referenced by other code. This might be fine as long as we're confident this will never get pulled into other modules that end up run as part of normal tests.


PS1, Line 96: 'tpch', 'tpcds', 'TPCDS', 'TPCH'
are both cases necessary?


PS1, Line 100:   parser.add_argument("-b", "--buckets", default="9",
             :       help="Number of buckets to partition Kudu tables (only for hash-based).")
Seems fine for now, but maybe we could have #buckets as a multiple of the #nodes (which I think we can get from one of the test infra classes), at least for the big tables. Maybe dimension tables are always a constant.


http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

Line 1: ---- Template SQL statements to create and load TPCDS tables in
can you explain a bit about how you picked the PKs? While we probably need to think more about modeling these later, I think we may need to have a way to set the #buckets for big tables and some other value for smaller tables. Maybe we can just have 2 variables for now? I think it matters because it directly affects the number of scan ranges we create, and only having 9 ranges for the bigger tables isn't enough on an 8 node cluster to expose some real issues (I'd think).


PS1, Line 2: KUDU.
prev line


http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpch/tpch_kudu_template.sql
File testdata/datasets/tpch/tpch_kudu_template.sql:

Line 1: ---- Template SQL statements to create and load TPCH tables in
remove the tpch tables in tpch_schema_template.sql?


PS1, Line 2: KUDU
prev line


http://gerrit.cloudera.org:8080/#/c/4327/1/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS1, Line 900: engine=''
I wasn't sure what engine meant until I looked at the usage. I'm wondering if there's another way to do this, maybe after I look at the file structure I'll have an idea. If nothing else, please comment this.


PS1, Line 1382:   if not args.tpcds_db and not args.tpch_db and not args.random_db \
              :       and not args.tpch_nested_db and not args.tpch_kudu_db \
              :       and not args.tpcds_kudu_db and not args.query_file_path:
              :     raise Exception("At least one of --tpcds-db, --tpch-db, --tpch-kudu-db,"
              :         "--tpcds-kudu-db, --tpch-nested-db, --random-db, --query-file-path is required")
Hmm cumbersome... Maybe someone with more python experience knows a better way to handle these (opt/argparse feature?)


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4327/6/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS6, Line 4: ---- TODO: Remove the IS NOT NULL predicates when nullable keys are supported
> I haven't verified the data and in theory PK columns shouldn't have nulls. 
That sounds like a bunch of work-- If you don't think there are any nulls, seems easiest to remove it and kick off a test run?

While it doesn't really matter for us when we just throw a lot of queries at the system, I'm just worried we could start using these tables for other things at some point later and get ourselves into trouble.


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 10: Code-Review+2

Fix data loading for Kudu tables and rebase. Carry MJ's +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Michael Brown, Matthew Jacobs, Internal Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4327

to look at the new patch set (#10).

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
M testdata/datasets/functional/functional_schema_template.sql
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
31 files changed, 2,459 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4327/1//COMMIT_MSG
Commit Message:

PS1, Line 13: D
> ds
Done


http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/bin/load-tpc-kudu.py
File testdata/bin/load-tpc-kudu.py:

PS1, Line 50: with
> IIRC this syntax breaks on py 2.4, which we shouldn't be using for these te
Hm, I've seen other scripts (e.g. load_nested.py)  already using the same syntax. Maybe Michael has a recommendation here.


PS1, Line 96: 'tpch', 'tpcds', 'TPCDS', 'TPCH'
> are both cases necessary?
I just added it for usability in case someone decides to specify the workload in upper case. Removed.


PS1, Line 100:   parser.add_argument("-b", "--buckets", default="9",
             :       help="Number of buckets to partition Kudu tables (only for hash-based).")
> Seems fine for now, but maybe we could have #buckets as a multiple of the #
Left a TODO for now, so we can revisit later depending on how we can to test this.


http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

Line 1: ---- Template SQL statements to create and load TPCDS tables in
> can you explain a bit about how you picked the PKs? While we probably need 
Good points. In general, I followed the spec in setting the PK columns. Added a TODO to have two different variables for buckets one for fact and one for dimension tables.


PS1, Line 2: KUDU.
> prev line
Done


http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpch/tpch_kudu_template.sql
File testdata/datasets/tpch/tpch_kudu_template.sql:

Line 1: ---- Template SQL statements to create and load TPCH tables in
> remove the tpch tables in tpch_schema_template.sql?
Added a TODO to do this in a follow up patch.


PS1, Line 2: KUDU
> prev line
Done


http://gerrit.cloudera.org:8080/#/c/4327/1/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS1, Line 900: engine=''
> I wasn't sure what engine meant until I looked at the usage. I'm wondering 
Yeah, I over-generalized this one. Changed it to something more explicit. Done


PS1, Line 1382:   if not args.tpcds_db and not args.tpch_db and not args.random_db \
              :       and not args.tpch_nested_db and not args.tpch_kudu_db \
              :       and not args.tpcds_kudu_db and not args.query_file_path:
              :     raise Exception("At least one of --tpcds-db, --tpch-db, --tpch-kudu-db,"
              :         "--tpcds-kudu-db, --tpch-nested-db, --random-db, --query-file-path is required")
> Hmm cumbersome... Maybe someone with more python experience knows a better 
Hm, maybe Michael has a suggestion here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4327/1//COMMIT_MSG
Commit Message:

Line 18: 3. Created SQL files with TPC-DS queries to be executed in Kudu.
> (Comment that TPC-H already exists?)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Reviewed-on: http://gerrit.cloudera.org:8080/4327
Reviewed-by: Dimitris Tsirogiannis <dt...@cloudera.com>
Tested-by: Internal Jenkins
---
A testdata/bin/load-tpc-kudu.py
M testdata/datasets/functional/functional_schema_template.sql
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
31 files changed, 2,459 insertions(+), 6 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Dimitris Tsirogiannis: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
30 files changed, 2,476 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4327/6/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS6, Line 4: ---- TODO: Remove the IS NOT NULL predicates when nullable keys are supported
> thanks, but should TPCDS have PKs with nulls in the first place? Basically 
I haven't verified the data and in theory PK columns shouldn't have nulls. I can make another pass over schema and data and see if it's safe to remove the IS NOT NULL predicates.


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 6: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4327/6/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS6, Line 4: ---- TODO: Remove the IS NOT NULL predicates when nullable keys are supported
this won't be supported, but shouldn't these tables be able to have PKs that are non-nullable? Most DBs don't have nullable PKs.


http://gerrit.cloudera.org:8080/#/c/4327/6/testdata/datasets/tpch/tpch_kudu_template.sql
File testdata/datasets/tpch/tpch_kudu_template.sql:

PS6, Line 3: ---- TODO: Remove the IS NOT NULL predicates when nullable keys are supported
same


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#4).

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
30 files changed, 2,473 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
30 files changed, 2,478 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 1:

(2 comments)

I'm still reviewing this, but since my name was called out, here are two comments.

http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/bin/load-tpc-kudu.py
File testdata/bin/load-tpc-kudu.py:

PS1, Line 50: with
> Hm, I've seen other scripts (e.g. load_nested.py)  already using the same s
It's not clear to me that we have to support python2.4 in test code. If we do, then our test code is already broken in many, many places. I would prefer to set the floor for test code at python2.6, which is already inconveniently old. I'm OK to leave this as is.


http://gerrit.cloudera.org:8080/#/c/4327/1/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS1, Line 1382:   if not args.tpcds_db and not args.tpch_db and not args.random_db \
              :       and not args.tpch_nested_db and not args.tpch_kudu_db \
              :       and not args.tpcds_kudu_db and not args.query_file_path:
              :     raise Exception("At least one of --tpcds-db, --tpch-db, --tpch-kudu-db,"
              :         "--tpcds-kudu-db, --tpch-nested-db, --random-db, --query-file-path is required")
> Hmm cumbersome... Maybe someone with more python experience knows a better 
If not any() should work.

https://docs.python.org/2.6/library/functions.html#any


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4327

to look at the new patch set (#7).

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
30 files changed, 2,468 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 6: Code-Review+1

(2 comments)

Carry Michael's +1

http://gerrit.cloudera.org:8080/#/c/4327/6/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS6, Line 4: ---- TODO: Remove the IS NOT NULL predicates when nullable keys are supported
> this won't be supported, but shouldn't these tables be able to have PKs tha
Correct. Removed the TODO.


http://gerrit.cloudera.org:8080/#/c/4327/6/testdata/datasets/tpch/tpch_kudu_template.sql
File testdata/datasets/tpch/tpch_kudu_template.sql:

PS6, Line 3: ---- TODO: Remove the IS NOT NULL predicates when nullable keys are supported
> same
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4327/5/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS5, Line 469: 'kudu.table_name'='customer',
> Here's one kudu table called 'customer'. Can this (and others) be expanded 
Done


http://gerrit.cloudera.org:8080/#/c/4327/5/testdata/datasets/tpch/tpch_kudu_template.sql
File testdata/datasets/tpch/tpch_kudu_template.sql:

PS5, Line 185:   'kudu.table_name' = 'customer',
> Here's another kudu table called "customer".
Added "{target_db_name}_" to all the Kudu table names.


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#6).

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
30 files changed, 2,470 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 9: Code-Review+2

Rebase and carry MJ's +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No