You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fredy Wijaya (Code Review)" <ge...@cloudera.org> on 2018/07/13 08:47:41 UTC
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Fredy Wijaya has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10939
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
IMPALA-7259: Improve Impala shell performance
This patch fixes the slow performance in Impala shell by replacing all
calls to sqlparse.format(sql, strip_comments=True) with the custom
implementation of strip comments that does not use grouping. The code
to strip leading comments was also refactored to not use grouping.
* Benchmark running a query with 12K columns *
Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real 2m4.154s
user 2m0.536s
sys 0m0.088s
After the patch:
$ time impala-shell.sh -f large.sql --quiet
real 0m3.885s
user 0m1.516s
sys 0m0.048s
Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests
Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 46 insertions(+), 14 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/4
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
IMPALA-7259: Improve Impala shell performance
This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.
* Benchmark running a query with 12K columns *
Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real 2m4.154s
user 2m0.536s
sys 0m0.088s
After the patch:
$ time impala-shell.sh -f large.sql --quiet
real 0m3.885s
user 0m1.516s
sys 0m0.048s
Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7
Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
A tests/shell/test_file_large.sql
M tests/shell/test_shell_commandline.py
3 files changed, 16,051 insertions(+), 14 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/6
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 6
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 10:
Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2832/ DRY_RUN=false
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 10
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 16:30:23 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 9: Code-Review+1
Carry Mike's +1
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 22:43:37 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 7: Code-Review+1
(1 comment)
for the test
http://gerrit.cloudera.org:8080/#/c/10939/7/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:
http://gerrit.cloudera.org:8080/#/c/10939/7/tests/shell/test_shell_commandline.py@31
PS7, Line 31: from time import sleep
: from time import time
Sorry, you can do:
from time import sleep, time
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 17:03:08 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 7:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:
http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py@664
PS6, Line 664: f test_large_sql(self
> It's preferred style to keep imports at the top of modules. You can add tim
Done
http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py@672
PS6, Line 672: assert actual_time_s <= time_limit_s, (
: "It took {0} seconds to execute the query. Time limit is {1} seconds.".format(
: actual_time_s, time_limit_s))
> assert actual_time_s <= time_limit_s, (
Ooops, I must've been too sleepy last night :) Done.
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 16:14:42 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
IMPALA-7259: Improve Impala shell performance
This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.
* Benchmark running a query with 12K columns *
Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real 2m4.154s
user 2m0.536s
sys 0m0.088s
After the patch:
$ time impala-shell.sh -f large.sql --quiet
real 0m3.885s
user 0m1.516s
sys 0m0.048s
Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7
Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Reviewed-on: http://gerrit.cloudera.org:8080/10939
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 70 insertions(+), 15 deletions(-)
Approvals:
Impala Public Jenkins: Looks good to me, approved; Verified
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 14
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 11: Verified-1
Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2836/
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Jul 2018 01:59:31 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 11: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 22:45:03 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 10: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 10
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 16:30:22 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
IMPALA-7259: Improve Impala shell performance
This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.
* Benchmark running a query with 12K columns *
Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real 2m4.154s
user 2m0.536s
sys 0m0.088s
After the patch:
$ time impala-shell.sh -f large.sql --quiet
real 0m3.885s
user 0m1.516s
sys 0m0.048s
Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7
Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
A tests/shell/test_file_large.sql
M tests/shell/test_shell_commandline.py
3 files changed, 16,051 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/8
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 9:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/10939/8/shell/impala_shell.py
File shell/impala_shell.py:
http://gerrit.cloudera.org:8080/#/c/10939/8/shell/impala_shell.py@72
PS8, Line 72: return ''.join(stack.run(sql, 'utf-8')).strip()
> I noticed that sqlparse.format() uses an unicode string here: https://githu
I don't think it makes a difference and I believe we use utf-8 everywhere in Impala shell.
http://gerrit.cloudera.org:8080/#/c/10939/8/tests/shell/test_file_large.sql
File tests/shell/test_file_large.sql:
PS8:
> Trying to boost your LoC count? :)
LOL, I will auto-generate it instead.
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 22:19:02 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
IMPALA-7259: Improve Impala shell performance
This patch fixes the slow performance in Impala shell by replacing all
calls to sqlparse.format(sql, strip_comments=True) with the custom
implementation of strip comments that does not use grouping. The code
to strip leading comments was also refactored to not use grouping.
* Benchmark running a query with 12K columns *
Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real 2m4.154s
user 2m0.536s
sys 0m0.088s
After the patch:
$ time impala-shell.sh -f large.sql --quiet
real 0m3.885s
user 0m1.516s
sys 0m0.048s
Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests
Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
A tests/shell/test_file_large.sql
M tests/shell/test_shell_commandline.py
3 files changed, 16,051 insertions(+), 14 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/5
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 5
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
IMPALA-7259: Improve Impala shell performance
This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.
* Benchmark running a query with 12K columns *
Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real 2m4.154s
user 2m0.536s
sys 0m0.088s
After the patch:
$ time impala-shell.sh -f large.sql --quiet
real 0m3.885s
user 0m1.516s
sys 0m0.048s
Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7
Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
A tests/shell/test_file_large.sql
M tests/shell/test_shell_commandline.py
3 files changed, 16,051 insertions(+), 14 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/7
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 9: Code-Review+2
> Patch Set 9: Code-Review+1
>
> I'll give Nghia a chance to look too.
I spoke to Nghia and he said the CR looks good. I'm going to give it a +2.
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 16:29:49 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
IMPALA-7259: Improve Impala shell performance
This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.
* Benchmark running a query with 12K columns *
Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real 2m4.154s
user 2m0.536s
sys 0m0.088s
After the patch:
$ time impala-shell.sh -f large.sql --quiet
real 0m3.885s
user 0m1.516s
sys 0m0.048s
Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7
Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 70 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/9
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 6:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:
http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py@664
PS6, Line 664: from time import time
It's preferred style to keep imports at the top of modules. You can add time to L31.
http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py@672
PS6, Line 672: if actual_time_s > time_limit_s:
: assert False, "It took {0} seconds to execute the query. " \
: "Time limit is {1} seconds.".format(actual_time_s, time_limit_s)
assert actual_time_s <= time_limit_s, (
"It took {0} seconds to execute the query. Time limit is {1} seconds.".format(
actual_time_s, time_limit_s))
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 6
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 15:58:52 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 8:
(2 comments)
Thanks for fixing this!
http://gerrit.cloudera.org:8080/#/c/10939/8/shell/impala_shell.py
File shell/impala_shell.py:
http://gerrit.cloudera.org:8080/#/c/10939/8/shell/impala_shell.py@72
PS8, Line 72: return ''.join(stack.run(sql, 'utf-8')).strip()
I noticed that sqlparse.format() uses an unicode string here: https://github.com/andialbrecht/sqlparse/blob/396f19d00fa20a8fc891ab9351bee64b334aac3a/sqlparse/__init__.py#L60
Does this make a difference? Or does '' get coerced to unicode anyway?
http://gerrit.cloudera.org:8080/#/c/10939/8/tests/shell/test_file_large.sql
File tests/shell/test_file_large.sql:
PS8:
Trying to boost your LoC count? :)
Seriously though, this is ok, but it might be better to programmatically generate this in the test and write it to a temporary file.
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 21:20:11 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 11:
Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2836/ DRY_RUN=false
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 22:45:04 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
IMPALA-7259: Improve Impala shell performance
This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.
* Benchmark running a query with 12K columns *
Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real 2m4.154s
user 2m0.536s
sys 0m0.088s
After the patch:
$ time impala-shell.sh -f large.sql --quiet
real 0m3.885s
user 0m1.516s
sys 0m0.048s
Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7
Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 70 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/13
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 13
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 9: Code-Review+1
I'll give Nghia a chance to look too.
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 23:25:24 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance
Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )
Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................
Patch Set 8:
(1 comment)
Carry Mike's +1
http://gerrit.cloudera.org:8080/#/c/10939/7/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:
http://gerrit.cloudera.org:8080/#/c/10939/7/tests/shell/test_shell_commandline.py@31
PS7, Line 31: from time import sleep, time
: from util import IMPA
> Sorry, you can do:
Done
--
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 17:53:59 +0000
Gerrit-HasComments: Yes