You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Amogh Margoor (Code Review)" <ge...@cloudera.org> on 2021/09/19 23:15:48 UTC

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Amogh Margoor has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17855


Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................

IMPALA-10921 Add script to compare TPCDS runs.

This script compares 2 runs of TPCDS by parsing their respective
Impala plain text query profiles. It currently outputs the peak
memory comparision of both runs where:
1. It compares average per-node peak memory and geo-mean
   per-node peak memory.
2. It compares max peak memory reduction among Hash operators.

It can be extended to other comparisions in future.

Example usage:

 tpcds_run_comparator.py <path to base run profile> <path to new run profile>

Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
---
A bin/diagnostics/experimental/tpcds_run_comparator.py
1 file changed, 263 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/17855/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor <am...@gmail.com>

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 1:

(39 comments)

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@44
PS1, Line 44: class Task(Enum):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@72
PS1, Line 72: def debug(s):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@76
PS1, Line 76: def geo_mean(bytes):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@79
PS1, Line 79: /
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@81
PS1, Line 81: def str_to_byte(line):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@96
PS1, Line 96: class ProfileParser:
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@105
PS1, Line 105: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@106
PS1, Line 106: =
flake8: E225 missing whitespace around operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@107
PS1, Line 107: =
flake8: E225 missing whitespace around operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@108
PS1, Line 108:     
flake8: W293 blank line contains whitespace


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@108
PS1, Line 108:     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@110
PS1, Line 110: d
flake8: E303 too many blank lines (2)


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@169
PS1, Line 169: c
flake8: E999 SyntaxError: invalid syntax


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@181
PS1, Line 181: /
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@184
PS1, Line 184: /
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@184
PS1, Line 184: *
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@185
PS1, Line 185: /
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@185
PS1, Line 185: *
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@192
PS1, Line 192: +
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@194
PS1, Line 194: +
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@196
PS1, Line 196: +
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@198
PS1, Line 198: +
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@202
PS1, Line 202: /
flake8: E225 missing whitespace around operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@203
PS1, Line 203: /
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@204
PS1, Line 204: /
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@204
PS1, Line 204: *
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@204
PS1, Line 204: /
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@204
PS1, Line 204: *
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@204
PS1, Line 204: /
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@204
PS1, Line 204: *
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@205
PS1, Line 205: /
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@205
PS1, Line 205: *
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@211
PS1, Line 211: def print_results(ht_mem_res, op_mem_res):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@227
PS1, Line 227: def is_dir(path):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@232
PS1, Line 232: def main():
flake8: E302 expected 2 blank lines, found 0


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@246
PS1, Line 246: +
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@246
PS1, Line 246: +
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@248
PS1, Line 248: +
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@262
PS1, Line 262: if __name__ == "__main__":
flake8: E305 expected 2 blank lines after class or function definition, found 0



-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sun, 19 Sep 2021 23:16:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9470/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sun, 19 Sep 2021 23:39:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17855/2/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/2/bin/diagnostics/experimental/tpcds_run_comparator.py@183
PS2, Line 183: c
flake8: E999 SyntaxError: invalid syntax


http://gerrit.cloudera.org:8080/#/c/17855/2/bin/diagnostics/experimental/tpcds_run_comparator.py@250
PS2, Line 250: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17855/2/bin/diagnostics/experimental/tpcds_run_comparator.py@277
PS2, Line 277: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17855/2/bin/diagnostics/experimental/tpcds_run_comparator.py@299
PS2, Line 299: 
flake8: W292 no newline at end of file



-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 13:17:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5: Code-Review+2

Nice work!


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 15:45:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17855/3/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/3/bin/diagnostics/experimental/tpcds_run_comparator.py@182
PS3, Line 182: c
flake8: E999 SyntaxError: invalid syntax



-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 3
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 13:27:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17855/4/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/4/bin/diagnostics/experimental/tpcds_run_comparator.py@39
PS4, Line 39: #   tpcds_run_comparator.py <path to base run profile> <path to new run profile>
Update the example usage with <path to result csv file> ?


http://gerrit.cloudera.org:8080/#/c/17855/4/bin/diagnostics/experimental/tpcds_run_comparator.py@247
PS4, Line 247:     header2 = ['TPCDS query profile', 'base avg (MB)', 'new avg (MB)',
             :       'avg reduction %', 'base geomean (MB)', 'new geomean (MB)', 'geomean reduction %']
What I meant in my comment before was to split the output table to 2 separate CSV files.
But this seems to print both on the same file?



-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 17:10:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7535/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 09:30:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 1:

(5 comments)

Thanks for adding using useful tools to the repository!

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@57
PS1, Line 57: SECTIONS = [
nit: could you please add some comments for SECTIONS? It's a bit misleading to use the next section's first line to mark the end of the current section.


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@93
PS1, Line 93:     byte *= 1024**3
Maybe raise an error for unknown 'unit'.


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@132
PS1, Line 132:     if len(parts) == 7:
Could you add some comments here?


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@180
PS1, Line 180: <
maybe it would be interesting to see if there was an increase in some cases.


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@231
PS1, Line 231: "{path} is not a valid path
needs format?



-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Sep 2021 16:02:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has removed a vote on this change.

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Removed Verified-1 by Impala Public Jenkins <im...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................

IMPALA-10921 Add script to compare TPCDS runs.

This script compares 2 runs of TPCDS by parsing their respective
Impala plain text query profiles. It currently outputs the peak
memory comparision of both runs where:
1. It compares average per-node peak memory and geo-mean
   per-node peak memory.
2. It compares max peak memory reduction among Hash operators.

It can be extended to other comparisions in future.

Example usage:

 tpcds_run_comparator.py <path to base run profile>
 <path to new run profile> [path to result csv file]

Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
---
A bin/diagnostics/experimental/tpcds_run_comparator.py
1 file changed, 297 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/17855/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17855/4/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/4/bin/diagnostics/experimental/tpcds_run_comparator.py@39
PS4, Line 39: #   tpcds_run_comparator.py <path to base run profile> <path to new run profile>
> Update the example usage with <path to result csv file> ?
done.


http://gerrit.cloudera.org:8080/#/c/17855/4/bin/diagnostics/experimental/tpcds_run_comparator.py@247
PS4, Line 247:     writer.writerows(ht_mem_res)
             :     header2 = ['TPCDS query profile', 'base avg (MB)', 'new avg (MB)',
> What I meant in my comment before was to split the output table to 2 separa
Yes, it prints to one file. Having 2 files would be unnecessary for same analysis. I envision this script to have multiple analysis in future like peak memory analysis and each analysis would having multiple comparison metric (max per-operator peak memory, node peak memory etc). It is ok to have separate csv for every analysis, but having it for every comparison would be an overkill.



-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 11:58:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................

IMPALA-10921 Add script to compare TPCDS runs.

This script compares 2 runs of TPCDS by parsing their respective
Impala plain text query profiles. It currently outputs the peak
memory comparision of both runs where:
1. It compares average per-node peak memory and geo-mean
   per-node peak memory.
2. It compares max peak memory reduction among Hash operators.

It can be extended to other comparisions in future.

Example usage:

 tpcds_run_comparator.py <path to base run profile> <path to new run profile>
 <path to result csv file>

Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
---
A bin/diagnostics/experimental/tpcds_run_comparator.py
1 file changed, 300 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/17855/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 3
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9591/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 12:15:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7530/


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 21:58:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7535/


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 15:36:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7537/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Oct 2021 15:45:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5: Verified+1

The verify jobs failed with unrelated errors.


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 15 Oct 2021 10:50:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7530/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 15:46:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7537/


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Oct 2021 21:57:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9584/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 13:57:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9583/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 3
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 13:47:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 5: Code-Review+1

(4 comments)

LGTM, please carry my +1.

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@211
PS1, Line 211:     max1 = max(self.peak_mem)
> Added the option to write results to csv.
thank you.


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@245
PS1, Line 245:     header1 = ['TPCDS query profile', 'base peak memory', 'new peak memory']
             :     writer.writerow(header1)
> Good suggestion. taken care of.
thanks!


http://gerrit.cloudera.org:8080/#/c/17855/4/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/4/bin/diagnostics/experimental/tpcds_run_comparator.py@39
PS4, Line 39: #   tpcds_run_comparator.py <path to base run profile> <path to new run profile>
> done.
thanks!


http://gerrit.cloudera.org:8080/#/c/17855/4/bin/diagnostics/experimental/tpcds_run_comparator.py@247
PS4, Line 247:     writer.writerows(ht_mem_res)
             :     header2 = ['TPCDS query profile', 'base avg (MB)', 'new avg (MB)',
> Yes, it prints to one file. Having 2 files would be unnecessary for same an
Ok, seems appropriate if we'll have more analyses.



-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 15:11:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................

IMPALA-10921 Add script to compare TPCDS runs.

This script compares 2 runs of TPCDS by parsing their respective
Impala plain text query profiles. It currently outputs the peak
memory comparision of both runs where:
1. It compares average per-node peak memory and geo-mean
   per-node peak memory.
2. It compares max peak memory reduction among Hash operators.

It can be extended to other comparisions in future.

Example usage:

 tpcds_run_comparator.py <path to base run profile> <path to new run profile>
 <path to result csv file>

Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
---
A bin/diagnostics/experimental/tpcds_run_comparator.py
1 file changed, 299 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/17855/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 1:

(2 comments)

Hi Amogh,
Thanks for submitting the script.
Besides the flake8 errors, I have few comments and request to incorporate.

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@211
PS1, Line 211: def print_results(ht_mem_res, op_mem_res):
It will be great if we can add option to print this output to 2 csv files.


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@245
PS1, Line 245:     if (i == 23 or i == 24):
             :       filenames = [str(i)+"_a.txt", str(i)+"_b.txt"]
For iterating the profiles, what if we just use simple filename matching between the baseline dir and new measurement dir?
The reason is that the naming convention is not always standardized. I have seen variation such as '23_a.txt', '23a.txt', 'tpcds-23a.txt', and so on.
Using simple filename match will also make the script more general, not specific to just TPC-DS.



-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Sep 2021 18:17:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................

IMPALA-10921 Add script to compare TPCDS runs.

This script compares 2 runs of TPCDS by parsing their respective
Impala plain text query profiles. It currently outputs the peak
memory comparision of both runs where:
1. It compares average per-node peak memory and geo-mean
   per-node peak memory.
2. It compares max peak memory reduction among Hash operators.

It can be extended to other comparisions in future.

Example usage:

 tpcds_run_comparator.py <path to base run profile>
 <path to new run profile> <path to result csv file>

Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
---
A bin/diagnostics/experimental/tpcds_run_comparator.py
1 file changed, 296 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/17855/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9582/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 13:38:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................


Patch Set 2:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@44
PS1, Line 44: 
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@57
PS1, Line 57: RE_PEAK_MEM = re.compile("\d+\.\d\d [GMK]?B")
> nit: could you please add some comments for SECTIONS? It's a bit misleading
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@72
PS1, Line 72: 
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@76
PS1, Line 76: 
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@79
PS1, Line 79: 
> flake8: E226 missing whitespace around arithmetic operator
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@81
PS1, Line 81: 
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@93
PS1, Line 93:   unit = parts[1]
> Maybe raise an error for unknown 'unit'.
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@96
PS1, Line 96:   elif unit == 'KB':
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@105
PS1, Line 105: 
> flake8: E265 block comment should start with '# '
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@106
PS1, Line 106: 
> flake8: E225 missing whitespace around operator
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@107
PS1, Line 107: :
> flake8: E225 missing whitespace around operator
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@108
PS1, Line 108:   """Class that parses Impala plain text query profile"""
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@108
PS1, Line 108:   """Class that parses Impala plain text query profile"""
> flake8: W293 blank line contains whitespace
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@110
PS1, Line 110: d
> flake8: E303 too many blank lines (2)
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@132
PS1, Line 132:     """Parse execution summary section.
> Could you add some comments here?
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@180
PS1, Line 180: 
> maybe it would be interesting to see if there was an increase in some cases
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@181
PS1, Line 181: 
> flake8: E226 missing whitespace around arithmetic operator
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@184
PS1, Line 184: 
> flake8: E226 missing whitespace around arithmetic operator
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@184
PS1, Line 184: 
> flake8: E226 missing whitespace around arithmetic operator
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@185
PS1, Line 185: 
> flake8: E226 missing whitespace around arithmetic operator
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@192
PS1, Line 192: _
> flake8: E226 missing whitespace around arithmetic operator
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@211
PS1, Line 211:     geo_mean2 = geo_mean(pp.peak_mem)
> It will be great if we can add option to print this output to 2 csv files.
Added the option to write results to csv.


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@227
PS1, Line 227: def print_results(ht_mem_res, op_mem_res):
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@231
PS1, Line 231: operator Peak Memory")
> needs format?
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@232
PS1, Line 232:   print("""TPCDS query profile, base peak memory, new peak memory""")
> flake8: E302 expected 2 blank lines, found 0
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@245
PS1, Line 245:   with open(csv_path, 'w') as csv_f:
             :     writer = csv.writer(csv_f)
> For iterating the profiles, what if we just use simple filename matching be
Good suggestion. taken care of.


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@248
PS1, Line 248: r
> flake8: E226 missing whitespace around arithmetic operator
Done


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@262
PS1, Line 262: 
> flake8: E305 expected 2 blank lines after class or function definition, fou
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 13:19:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
......................................................................

IMPALA-10921 Add script to compare TPCDS runs.

This script compares 2 runs of TPCDS by parsing their respective
Impala plain text query profiles. It currently outputs the peak
memory comparision of both runs where:
1. It compares average per-node peak memory and geo-mean
   per-node peak memory.
2. It compares max peak memory reduction among Hash operators.

It can be extended to other comparisions in future.

Example usage:

 tpcds_run_comparator.py <path to base run profile>
 <path to new run profile> [path to result csv file]

Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Reviewed-on: http://gerrit.cloudera.org:8080/17855
Reviewed-by: Riza Suminto <ri...@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
Tested-by: Zoltan Borok-Nagy <bo...@cloudera.com>
---
A bin/diagnostics/experimental/tpcds_run_comparator.py
1 file changed, 297 insertions(+), 0 deletions(-)

Approvals:
  Riza Suminto: Looks good to me, but someone else must approve
  Zoltan Borok-Nagy: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>