You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Marshall (Code Review)" <ge...@cloudera.org> on 2018/10/26 22:58:44 UTC

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Thomas Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11805


Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................

IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

IMPALA-110 added support for queries with multiple DISTINCT aggregates
in a single select list. This patch adds queries to test this
functionality to our targeted-stress and targeted-perf workloads.

Also fixes some incorrect return types in another targeted-perf
aggregation query.

Testing:
- Ran the test file locally.

Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
---
M testdata/workloads/targeted-perf/queries/aggregation.test
M testdata/workloads/targeted-stress/queries/agg_stress.test
2 files changed, 83 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/11805/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG@16
PS1, Line 16: Testing:
            : - Ran the test file locally.
> You could:
So I guess I didn't realize before making this patch from conversation I had people, that we currently don't run anything in the stress test other than the regular tpch/ds queries. Is that correct?

One problem with putting these queries in the tpch workload and changing the regex is that there are already a number of query files in the tpch workload other than the regular tpch queries which presumably weren't written with the idea of being part of the stress test and aren't necessarily interesting for it.

Of course, we could do something like specify that any query files in the tpch workload that should be included in the stress test are prepended with 'stress-' or something, but how would you feel if I resurrected the targeted-stress workload for this purpose? I would also add a param to concurrent_select to control this, something like --use-targeted-stress with a default of false.

I would then propose also adding these queries to regular CI by enabling tests in stress/test_mini_stress.py (not to actually run concurrently or in a loop, just to do a regular run_test_case call)



-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 19:56:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................


Patch Set 1: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 21:12:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1292/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Nov 2018 21:01:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3455/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Nov 2018 19:42:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Nov 2018 19:42:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1183/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Oct 2018 23:30:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Hello Michael Brown, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11805

to look at the new patch set (#3).

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test
......................................................................

IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

IMPALA-110 added support for queries with multiple DISTINCT aggregates
in a single select list. This patch adds queries to test this
functionality to our targeted-perf workloads and fixes some incorrect
return types in another targeted-perf aggregation query.

It also adds some targeted queries to the stress test by extending the
regex for stress test files to accept files of the form
'tpch-stress-*' and to allow for multiple tests per file.

Testing:
- Added an e2e test that runs the stress test file.

Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
---
M testdata/workloads/targeted-perf/queries/aggregation.test
M testdata/workloads/tpcds/queries/tpcds-q39-1.test
M testdata/workloads/tpcds/queries/tpcds-q39-2.test
A testdata/workloads/tpch/queries/tpch-stress-aggregations.test
M tests/infra/test_stress_infra.py
M tests/query_test/test_aggregation.py
M tests/util/parse_util.py
M tests/util/test_file_parser.py
8 files changed, 112 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/11805/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Nov 2018 23:25:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test
......................................................................


Patch Set 3:

This seems fine now. I'm checking how it behaves in a downstream environment on a larger TPCH scale factor. Will update after some runs.


-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Nov 2018 23:00:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG@16
PS1, Line 16: Testing:
            : - Ran the test file locally.
> By "run locally" I mean that I put together a python test that run the file
You could:

1. Put the queries into the tpch workload directory

2. Get them running regularly in query_test/test_tpch.py so that your queries run in regular CI immediately

3. Alter the stress test regex and unit test that makes sure all expected queries are there

Commentary:

The easiest way to get them into the stress test is to add queries, 1 query per .test file, to the tpch workload, and extend the regex to match the files you added. To prevent the regex/matching from regressing, there are CI tests to make sure we match an expected number of files (at least two bugs in the past have busted the matching when people altered the countents of the tpc workload directories). That test is here: https://github.com/apache/impala/blob/master/tests/infra/test_stress_infra.py#L53 .

You could also run the queries in query_test/test_tpch.py, at which point you'll have regular CI on these queries as opposed to them being put into odd workloads.



-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Oct 2018 17:54:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG@16
PS1, Line 16: Testing:
            : - Ran the test file locally.
> > One problem with putting these queries in the tpch workload and
 > changing the regex is that there are already a number of query
 > files in the tpch workload other than the regular tpch queries
 > which presumably weren't written with the idea of being part of the
 > stress test and aren't necessarily interesting for it.
 > 
 > How is this a problem? The regex could opt in to the queries you
 > add without including the ones we're not interested in. I don't
 > personally see anything wrong with the stress- idea.
 
This is what I ended up going with



-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Nov 2018 20:26:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1296/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Nov 2018 21:33:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG@16
PS1, Line 16: Testing:
            : - Ran the test file locally.
> The queries are fine, but what does it mean to run these locally? I'm not t
By "run locally" I mean that I put together a python test that run the files with run_test_case(), like we would do for an equivalent file in the functional-query workload. The idea was to verify that I had the RESULTS/TYPES/LABELS sections correct.

The targeted-perf workload is used by single_node_perf_run.py. I believe its something that Mostafa also used to run to check for regressions between releases. Even if no one is running it currently, it seems reasonable to me to add these queries for the future when there is someone paying closer attention to perf issues.

The targeted-stress workload seems to be pointless. Its only references in tests/stress/test_mini_stress.py and all of the tests there have been xfailed for years. Probably we should remove targeted-stress, or even better move the queries in workloads/tpch/ that aren't actually regular tpch queries to targeted-stress and modify the stress test to use them. It would also be great to re-enable the tests in test_mini_stress.py, even if just to ensure that the RESULTS/TYPES/LABELS sections stay up to date. That's all (hopefully) outside the scope of this patch.

Yes, the intention is for these to be run as part of downstream larger-scale testing. Is there any way to accomplish that currently? It seems that concurrent_select won't find them even if I add them to eg. testdata/workloads/tpch/queries/tpch-aggregations.test, as it only checks for test files of the form 'tpch-q*.test'



-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Oct 2018 20:55:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Nov 2018 00:33:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG@16
PS1, Line 16: Testing:
            : - Ran the test file locally.
> we currently don't run anything in the stress test other than the regular tpch/ds queries. Is that correct?

Correct.

> One problem with putting these queries in the tpch workload and changing the regex is that there are already a number of query files in the tpch workload other than the regular tpch queries which presumably weren't written with the idea of being part of the stress test and aren't necessarily interesting for it.

How is this a problem? The regex could opt in to the queries you add without including the ones we're not interested in. I don't personally see anything wrong with the stress- idea.

In general there is a problem with our test data and workload organization, because a workload is about query characteristics, but a given workload may depend on one or more different data sets, and it's kind of a mess. There's nothing to bind that workload with the data it needs. Putting everything into tpch at least helps keep that binding, albeit softly: "We load TPCH data, and here are the queries that use TPCH data".

> how would you feel if I resurrected the targeted-stress workload for this purpose?

It depends on how you plan to implement it. agg_stress.test and stress-with-invalidate-refresh.test won't work for the parser because there are multiple statements in the files, so you can't just whitelist this entire directory. That will make your patch larger.

Note too --query-file-path exists.



-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 19:10:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG@16
PS1, Line 16: Testing:
            : - Ran the test file locally.
The queries are fine, but what does it mean to run these locally? I'm not too familiar with these workloads (targeted-perf, targeted-stress). Do the queries get run in GVO? Is it expected downstream larger-scale testing will run these? The stress test (concurrent_select.py) won't find these queries, for example.



-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Oct 2018 22:29:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Hello Michael Brown, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11805

to look at the new patch set (#2).

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
......................................................................

IMPALA-7761: Add multiple DISTINCT to targeted stress and perf

IMPALA-110 added support for queries with multiple DISTINCT aggregates
in a single select list. This patch adds queries to test this
functionality to our targeted-stress and targeted-perf workloads.

Also fixes some incorrect return types in another targeted-perf
aggregation query.

Testing:
- Ran the test file locally.

Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
---
M testdata/workloads/targeted-perf/queries/aggregation.test
M testdata/workloads/tpcds/queries/tpcds-q39-1.test
M testdata/workloads/tpcds/queries/tpcds-q39-2.test
A testdata/workloads/tpch/queries/tpch-stress-agg1.test
M tests/infra/test_stress_infra.py
M tests/util/parse_util.py
M tests/util/test_file_parser.py
7 files changed, 109 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/11805/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11805 )

Change subject: IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test
......................................................................

IMPALA-7761: Add multiple DISTINCT to targeted perf and stress test

IMPALA-110 added support for queries with multiple DISTINCT aggregates
in a single select list. This patch adds queries to test this
functionality to our targeted-perf workloads and fixes some incorrect
return types in another targeted-perf aggregation query.

It also adds some targeted queries to the stress test by extending the
regex for stress test files to accept files of the form
'tpch-stress-*' and to allow for multiple tests per file.

Testing:
- Added an e2e test that runs the stress test file.

Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Reviewed-on: http://gerrit.cloudera.org:8080/11805
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M testdata/workloads/targeted-perf/queries/aggregation.test
M testdata/workloads/tpcds/queries/tpcds-q39-1.test
M testdata/workloads/tpcds/queries/tpcds-q39-2.test
A testdata/workloads/tpch/queries/tpch-stress-aggregations.test
M tests/infra/test_stress_infra.py
M tests/query_test/test_aggregation.py
M tests/util/parse_util.py
M tests/util/test_file_parser.py
8 files changed, 112 insertions(+), 13 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Gerrit-Change-Number: 11805
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>