You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Wood (Code Review)" <ge...@cloudera.org> on 2017/11/10 01:50:05 UTC

[Impala-ASF-CR] IMPALA-6160: Rework query execution to handle multiple statements in a Query object.

Tim Wood has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8513


Change subject: IMPALA-6160: Rework query execution to handle multiple statements in a Query object.
......................................................................

IMPALA-6160: Rework query execution to handle multiple statements in a Query object.

Testing:
- Reproduced problem with bin/run-workload.py.
- Ran bin/run-workload.py --workloads=tpch,targeted-perf,tpcds
  --impalads=localhost:21000,localhost:21001,localhost:21002
  --results_json_file=$PWD/perf_results/IMPALA-6160.json
  --query_iterations=3 --table_formats=parquet/none --plan_first --query_names='.*'
  (Close to command line that single_node_perf_run.py builds.)

Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
---
M tests/performance/query_executor.py
1 file changed, 55 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/8513/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Hello Matthew Mulder, Michael Brown, Jim Apple, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8513

to look at the new patch set (#3).

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................

IMPALA-6160: Allow multiple statements in a Query object.

Testing:
- Reproduced problem with bin/run-workload.py.
- Ran bin/run-workload.py --workloads=tpch,targeted-perf,tpcds
  --impalads=localhost:21000,localhost:21001,localhost:21002
  --results_json_file=$PWD/perf_results/IMPALA-6160.json
  --query_iterations=3 --table_formats=parquet/none --plan_first
  --query_names='.*' (Close to command line that single_node_perf_run.py
  builds.)
- Manually reviewed perf_results/IMPALA-6160.json to verify presence of
  plans and proper splitting of query batches.

Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
---
M tests/performance/query_executor.py
1 file changed, 56 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/8513/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6160: Rework query execution to handle multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Rework query execution to handle multiple statements in a Query object.
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8513/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8513/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-6160: Rework query execution to handle multiple statements in a Query object.
> nit: For the subject line, please keep it under 72 characters and one line.
Ack


http://gerrit.cloudera.org:8080/#/c/8513/1//COMMIT_MSG@14
PS1, Line 14:   --query_iterations=3 --table_formats=parquet/none --plan_first --query_names='.*'
> nit: can you wrap at 72 characters, here? If you use emacs to write your co
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Nov 2017 20:44:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Rework query execution to handle multiple statements in a Query object.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Rework query execution to handle multiple statements in a Query object.
......................................................................


Patch Set 1:

(9 comments)

Thanks for taking this on!

http://gerrit.cloudera.org:8080/#/c/8513/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8513/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-6160: Rework query execution to handle multiple statements in a Query object.
nit: For the subject line, please keep it under 72 characters and one line. Maybe "Allow multiple statements in a query object"?


http://gerrit.cloudera.org:8080/#/c/8513/1//COMMIT_MSG@14
PS1, Line 14:   --query_iterations=3 --table_formats=parquet/none --plan_first --query_names='.*'
nit: can you wrap at 72 characters, here? If you use emacs to write your commit message, you can wrap automatically with ctrl-q, though it might not understand your bullets and then rewrap those unhelpfully.


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@45
PS1, Line 45: query options' names
I'm a bit confused about this - I don't see query options in these regexes.


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@52
PS1, Line 52: SELECT
SELECT starts statements that are not DDL or DML, yes?


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@52
PS1, Line 52: re.I
I think spelling out IGNORECASE makes it much more readable.


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@186
PS1, Line 186:   """Executes a query.
Can you change the comments and member variable to pluralize "query"?


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@224
PS1, Line 224: a set of SQL statements
This looks very general to me. Might you be able to get by with something simpler that only handles one SET followed by one EXPLAINable statement?


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@229
PS1, Line 229: timing
How DOES the timing work for this object? There doesn't seem to be any timing code directly in this. Maybe it is done in self.exec_func storing the timing info somewhere? The EXPLAINs are done with self.exec_func as well, so I'd guess that every call to self.exec_func clobbers the old timing. If that's right, then executing multiple queries will lead to only a timing from the last call being stored, yes?


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@231
PS1, Line 231: 
             :     This function originally assumed that self.query contained only a single
             :     query statement, but that assumption is not valid for a test file that
             :     contains, e.g., a SET DECIMAL_V2=n; statement before the actual query for the
             :     test.
nit: I think you can omit this.



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Sat, 11 Nov 2017 04:02:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@45
PS4, Line 45: DDL/DML
nit: CRUD


http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@190
PS4, Line 190: query (Query)
nit: can you please pluralize query everywhere it can be multiple queries, including this comment, line 200, the argument in line 206, the attribute on line 209, and so on?


http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@230
PS4, Line 230: (generally, DDL and DML statements)
nit: please remove



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 14:55:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8513/5/tests/performance/query.py
File tests/performance/query.py:

http://gerrit.cloudera.org:8080/#/c/8513/5/tests/performance/query.py@26
PS5, Line 26:     query_str (str): SQL query string; contains 1 or more ;-delimited SQL statements.
Key context.


http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@45
PS4, Line 45: CRUD st
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@200
PS4, Line 200:     query (Query): Container holding 1 or more ;-delimited SQL statements to be executed
> Will explain object identity here; this actually bit me when I treated it a
Done


http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@230
PS4, Line 230: first so timing does not include th
> Ack
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 19:24:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8513/5/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/5/tests/performance/query_executor.py@47
PS5, Line 47: COMMENT_LINES_REGEX = r'(?:\s*--.*\n)*'
> Update comment.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 22:03:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Hello Matthew Mulder, Michael Brown, Jim Apple, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8513

to look at the new patch set (#5).

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................

IMPALA-6160: Allow multiple statements in a Query object.

Testing:
- Reproduced problem with bin/run-workload.py.
- Ran bin/run-workload.py --workloads=tpch,targeted-perf,tpcds
  --impalads=localhost:21000,localhost:21001,localhost:21002
  --results_json_file=$PWD/perf_results/IMPALA-6160.json
  --query_iterations=3 --table_formats=parquet/none --plan_first
  --query_names='.*' (Close to command line that single_node_perf_run.py
  builds.)
- Manually reviewed perf_results/IMPALA-6160.json to verify presence of
  plans and proper splitting of query batches.

Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
---
M tests/performance/query.py
M tests/performance/query_executor.py
2 files changed, 60 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/8513/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 19:28:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 19:38:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................

IMPALA-6160: Allow multiple statements in a Query object.

Testing:
- Reproduced problem with bin/run-workload.py.
- Ran bin/run-workload.py --workloads=tpch,targeted-perf,tpcds
  --impalads=localhost:21000,localhost:21001,localhost:21002
  --results_json_file=$PWD/perf_results/IMPALA-6160.json
  --query_iterations=3 --table_formats=parquet/none --plan_first
  --query_names='.*' (Close to command line that single_node_perf_run.py
  builds.)
- Manually reviewed perf_results/IMPALA-6160.json to verify presence of
  plans and proper splitting of query batches.

Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Reviewed-on: http://gerrit.cloudera.org:8080/8513
Tested-by: Impala Public Jenkins
Reviewed-by: Jim Apple <jb...@apache.org>
---
M tests/performance/query.py
M tests/performance/query_executor.py
2 files changed, 60 insertions(+), 25 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 3:

(2 comments)

Just nits left

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@52
PS1, Line 52: 
> I consider SELECT as DML (data manipulation, but not mutation. :)  I could 
I get that you consider SELECT as DML, but I don't think most other people do, so I'd suggest renaming as a favor to them.


http://gerrit.cloudera.org:8080/#/c/8513/3/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/3/tests/performance/query_executor.py@190
PS3, Line 190:     query (Query): SQL query (batch, ;-delimited) to be executed
This is still queries, not a single query, right?



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Nov 2017 20:53:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1473/


-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 15:55:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@52
PS1, Line 52: 
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/8513/3/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/3/tests/performance/query_executor.py@190
PS3, Line 190:     query (Query): SQL query/queries (batch, ;-delimited) to be executed
> One or more; does the word "batch" not imply that?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Nov 2017 21:24:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Rework query execution to handle multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Rework query execution to handle multiple statements in a Query object.
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@45
PS1, Line 45: query options' names
> I'm a bit confused about this - I don't see query options in these regexes.
Residue; will correct.


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@52
PS1, Line 52: re.I
> I think spelling out IGNORECASE makes it much more readable.
Ack


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@52
PS1, Line 52: SELECT
> SELECT starts statements that are not DDL or DML, yes?
I consider SELECT as DML (data manipulation, but not mutation. :)  I could call this DDL_CRUD_PATTERN if it's that important.


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@186
PS1, Line 186:   """Executes a query.
> Can you change the comments and member variable to pluralize "query"?
Ack


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@224
PS1, Line 224: a set of SQL statements
> This looks very general to me. Might you be able to get by with something s
I don't want to limit the use cases to just SET, when there's nothing really unique about it.  This routine would get more complicated(*), add needless error conditions, and hasten the day when it would need changing again.  

* What if it's 2 SETs and a query?  A query, a SET and a query?  There's no need to reject these legal batches.  And, it turns out, the performance harness assumes results are correct without checking.


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@229
PS1, Line 229: timing
> How DOES the timing work for this object? There doesn't seem to be any timi
The execution logic in query_exec_functions.py sets the start_time and time_taken fields of the various QueryResult objects.  The loop at l. 256 below does overwrite the reference to the query result for statements 1..n-1 and leaves the result of the last as the result of the batch.  I will document this.  So far, there's no requirement for the higher layers to receive a list of results for the batch.


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@231
PS1, Line 231: 
             :     This function originally assumed that self.query contained only a single
             :     query statement, but that assumption is not valid for a test file that
             :     contains, e.g., a SET DECIMAL_V2=n; statement before the actual query for the
             :     test.
> nit: I think you can omit this.
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Nov 2017 19:40:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Rework query execution to handle multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Hello Matthew Mulder, Michael Brown, Jim Apple, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8513

to look at the new patch set (#2).

Change subject: IMPALA-6160: Rework query execution to handle multiple statements in a Query object.
......................................................................

IMPALA-6160: Rework query execution to handle multiple statements in a Query object.

Testing:
- Reproduced problem with bin/run-workload.py.
- Ran bin/run-workload.py --workloads=tpch,targeted-perf,tpcds
  --impalads=localhost:21000,localhost:21001,localhost:21002
  --results_json_file=$PWD/perf_results/IMPALA-6160.json
  --query_iterations=3 --table_formats=parquet/none --plan_first --query_names='.*'
  (Close to command line that single_node_perf_run.py builds.)
- Manually reviewed perf_results/IMPALA-6160.json to verify presence of plans
  and proper splitting of query batches.

Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
---
M tests/performance/query_executor.py
1 file changed, 55 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/8513/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Hello Matthew Mulder, Michael Brown, Jim Apple, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8513

to look at the new patch set (#4).

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................

IMPALA-6160: Allow multiple statements in a Query object.

Testing:
- Reproduced problem with bin/run-workload.py.
- Ran bin/run-workload.py --workloads=tpch,targeted-perf,tpcds
  --impalads=localhost:21000,localhost:21001,localhost:21002
  --results_json_file=$PWD/perf_results/IMPALA-6160.json
  --query_iterations=3 --table_formats=parquet/none --plan_first
  --query_names='.*' (Close to command line that single_node_perf_run.py
  builds.)
- Manually reviewed perf_results/IMPALA-6160.json to verify presence of
  plans and proper splitting of query batches.

Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
---
M tests/performance/query_executor.py
1 file changed, 56 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/8513/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@45
PS4, Line 45: DDL/DML
> nit: CRUD
Ack


http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@190
PS4, Line 190: query (Query)
> nit: can you please pluralize query everywhere it can be multiple queries, 
This "query" is not a string containing a batch of SQL statements, it's our Query object in Python.  The multiple-ness is abstracted away, and only concerns the execute() function below.  I'll change the synopsis to make that clear, but other users of QueryExecutor understand .query to be an object.


http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@200
PS4, Line 200:     query (Query): SQL query to be executed
Will explain object identity here; this actually bit me when I treated it as a string!


http://gerrit.cloudera.org:8080/#/c/8513/4/tests/performance/query_executor.py@230
PS4, Line 230: (generally, DDL and DML statements)
> nit: please remove
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 18:55:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8513/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8513/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-6160: Allow multiple statements in a Query object.
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/8513/1//COMMIT_MSG@14
PS1, Line 14:   --query_iterations=3 --table_formats=parquet/none --plan_first
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@45
PS1, Line 45: 
> Residue; will correct.
Done


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@52
PS1, Line 52: 
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@186
PS1, Line 186:   """Executes one or more queries.
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@229
PS1, Line 229: le" qu
> The execution logic in query_exec_functions.py sets the start_time and time
Done


http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@231
PS1, Line 231:     metadata loading required for planning.
             : 
             :     This function furnishes a query result object in self._result, for the last
             :     query in the batch ONLY.
             :     """
> Ack
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Nov 2017 20:48:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/1/tests/performance/query_executor.py@52
PS1, Line 52: 
> I get that you consider SELECT as DML, but I don't think most other people 
Ack


http://gerrit.cloudera.org:8080/#/c/8513/3/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/3/tests/performance/query_executor.py@190
PS3, Line 190:     query (Query): SQL query (batch, ;-delimited) to be executed
> This is still queries, not a single query, right?
One or more; does the word "batch" not imply that?

I'll put "query/queries".



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Nov 2017 21:12:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Hello Matthew Mulder, Michael Brown, Jim Apple, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8513

to look at the new patch set (#6).

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................

IMPALA-6160: Allow multiple statements in a Query object.

Testing:
- Reproduced problem with bin/run-workload.py.
- Ran bin/run-workload.py --workloads=tpch,targeted-perf,tpcds
  --impalads=localhost:21000,localhost:21001,localhost:21002
  --results_json_file=$PWD/perf_results/IMPALA-6160.json
  --query_iterations=3 --table_formats=parquet/none --plan_first
  --query_names='.*' (Close to command line that single_node_perf_run.py
  builds.)
- Manually reviewed perf_results/IMPALA-6160.json to verify presence of
  plans and proper splitting of query batches.

Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
---
M tests/performance/query.py
M tests/performance/query_executor.py
2 files changed, 60 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/8513/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6160: Allow multiple statements in a Query object.

Posted by "Tim Wood (Code Review)" <ge...@cloudera.org>.
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8513 )

Change subject: IMPALA-6160: Allow multiple statements in a Query object.
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8513/5/tests/performance/query_executor.py
File tests/performance/query_executor.py:

http://gerrit.cloudera.org:8080/#/c/8513/5/tests/performance/query_executor.py@47
PS5, Line 47: # strip out.
Update comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/8513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac86af181b7c42655f21d2c1efd4652dd35d9297
Gerrit-Change-Number: 8513
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Wood <tw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Mulder <mm...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Wood <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 22:00:46 +0000
Gerrit-HasComments: Yes