You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fredy Wijaya (Code Review)" <ge...@cloudera.org> on 2018/07/13 08:47:41 UTC

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Fredy Wijaya has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10939


Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................

IMPALA-7259: Improve Impala shell performance

This patch fixes the slow performance in Impala shell by replacing all
calls to sqlparse.format(sql, strip_comments=True) with the custom
implementation of strip comments that does not use grouping. The code
to strip leading comments was also refactored to not use grouping.

* Benchmark running a query with 12K columns *

Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real    2m4.154s
user    2m0.536s
sys     0m0.088s

After the patch:
$ time impala-shell.sh -f large.sql --quiet
real    0m3.885s
user    0m1.516s
sys     0m0.048s

Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests

Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 46 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................

IMPALA-7259: Improve Impala shell performance

This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.

* Benchmark running a query with 12K columns *

Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real    2m4.154s
user    2m0.536s
sys     0m0.088s

After the patch:
$ time impala-shell.sh -f large.sql --quiet
real    0m3.885s
user    0m1.516s
sys     0m0.048s

Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7

Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
A tests/shell/test_file_large.sql
M tests/shell/test_shell_commandline.py
3 files changed, 16,051 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 6
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2832/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 10
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 16:30:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 9: Code-Review+1

Carry Mike's +1


-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 22:43:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 7: Code-Review+1

(1 comment)

for the test

http://gerrit.cloudera.org:8080/#/c/10939/7/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/10939/7/tests/shell/test_shell_commandline.py@31
PS7, Line 31: from time import sleep
            : from time import time
Sorry, you can do:

  from time import sleep, time



-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 17:03:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py@664
PS6, Line 664: f test_large_sql(self
> It's preferred style to keep imports at the top of modules. You can add tim
Done


http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py@672
PS6, Line 672:     assert actual_time_s <= time_limit_s, (
             :         "It took {0} seconds to execute the query. Time limit is {1} seconds.".format(
             :             actual_time_s, time_limit_s))
> assert actual_time_s <= time_limit_s, (
Ooops, I must've been too sleepy last night :) Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 16:14:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................

IMPALA-7259: Improve Impala shell performance

This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.

* Benchmark running a query with 12K columns *

Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real    2m4.154s
user    2m0.536s
sys     0m0.088s

After the patch:
$ time impala-shell.sh -f large.sql --quiet
real    0m3.885s
user    0m1.516s
sys     0m0.048s

Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7

Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Reviewed-on: http://gerrit.cloudera.org:8080/10939
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 70 insertions(+), 15 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 14
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 11: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2836/


-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Jul 2018 01:59:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 22:45:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 10
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 16:30:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................

IMPALA-7259: Improve Impala shell performance

This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.

* Benchmark running a query with 12K columns *

Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real    2m4.154s
user    2m0.536s
sys     0m0.088s

After the patch:
$ time impala-shell.sh -f large.sql --quiet
real    0m3.885s
user    0m1.516s
sys     0m0.048s

Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7

Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
A tests/shell/test_file_large.sql
M tests/shell/test_shell_commandline.py
3 files changed, 16,051 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/8
-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10939/8/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/10939/8/shell/impala_shell.py@72
PS8, Line 72:   return ''.join(stack.run(sql, 'utf-8')).strip()
> I noticed that sqlparse.format() uses an unicode string here: https://githu
I don't think it makes a difference and I believe we use utf-8 everywhere in Impala shell.


http://gerrit.cloudera.org:8080/#/c/10939/8/tests/shell/test_file_large.sql
File tests/shell/test_file_large.sql:

PS8: 
> Trying to boost your LoC count? :)
LOL, I will auto-generate it instead.



-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 22:19:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................

IMPALA-7259: Improve Impala shell performance

This patch fixes the slow performance in Impala shell by replacing all
calls to sqlparse.format(sql, strip_comments=True) with the custom
implementation of strip comments that does not use grouping. The code
to strip leading comments was also refactored to not use grouping.

* Benchmark running a query with 12K columns *

Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real    2m4.154s
user    2m0.536s
sys     0m0.088s

After the patch:
$ time impala-shell.sh -f large.sql --quiet
real    0m3.885s
user    0m1.516s
sys     0m0.048s

Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests

Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
A tests/shell/test_file_large.sql
M tests/shell/test_shell_commandline.py
3 files changed, 16,051 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 5
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................

IMPALA-7259: Improve Impala shell performance

This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.

* Benchmark running a query with 12K columns *

Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real    2m4.154s
user    2m0.536s
sys     0m0.088s

After the patch:
$ time impala-shell.sh -f large.sql --quiet
real    0m3.885s
user    0m1.516s
sys     0m0.048s

Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7

Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
A tests/shell/test_file_large.sql
M tests/shell/test_shell_commandline.py
3 files changed, 16,051 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/7
-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 9: Code-Review+2

> Patch Set 9: Code-Review+1
> 
> I'll give Nghia a chance to look too.

I spoke to Nghia and he said the CR looks good. I'm going to give it a +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 16:29:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................

IMPALA-7259: Improve Impala shell performance

This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.

* Benchmark running a query with 12K columns *

Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real    2m4.154s
user    2m0.536s
sys     0m0.088s

After the patch:
$ time impala-shell.sh -f large.sql --quiet
real    0m3.885s
user    0m1.516s
sys     0m0.048s

Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7

Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 70 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/9
-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py@664
PS6, Line 664: from time import time
It's preferred style to keep imports at the top of modules. You can add time to L31.


http://gerrit.cloudera.org:8080/#/c/10939/6/tests/shell/test_shell_commandline.py@672
PS6, Line 672:     if actual_time_s > time_limit_s:
             :       assert False, "It took {0} seconds to execute the query. " \
             :                     "Time limit is {1} seconds.".format(actual_time_s, time_limit_s)
assert actual_time_s <= time_limit_s, (
      "It took {0} seconds to execute the query. Time limit is {1} seconds.".format(
          actual_time_s, time_limit_s))



-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 6
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 15:58:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 8:

(2 comments)

Thanks for fixing this!

http://gerrit.cloudera.org:8080/#/c/10939/8/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/10939/8/shell/impala_shell.py@72
PS8, Line 72:   return ''.join(stack.run(sql, 'utf-8')).strip()
I noticed that sqlparse.format() uses an unicode string here: https://github.com/andialbrecht/sqlparse/blob/396f19d00fa20a8fc891ab9351bee64b334aac3a/sqlparse/__init__.py#L60

Does this make a difference? Or does '' get coerced to unicode anyway?


http://gerrit.cloudera.org:8080/#/c/10939/8/tests/shell/test_file_large.sql
File tests/shell/test_file_large.sql:

PS8: 
Trying to boost your LoC count? :)

Seriously though, this is ok, but it might be better to programmatically generate this in the test and write it to a temporary file.



-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 21:20:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2836/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Jul 2018 22:45:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................

IMPALA-7259: Improve Impala shell performance

This patch fixes the slow performance in Impala shell, especially for
large queries by replacing all calls to sqlparse.format(sql_string,
strip_comments=True) with the custom implementation of strip comments
that does not use grouping. The code to strip leading comments was also
refactored to not use grouping.

* Benchmark running a query with 12K columns *

Before the patch:
$ time impala-shell.sh -f large.sql --quiet
real    2m4.154s
user    2m0.536s
sys     0m0.088s

After the patch:
$ time impala-shell.sh -f large.sql --quiet
real    0m3.885s
user    0m1.516s
sys     0m0.048s

Testing:
- Added a new test to test the Impala shell performance
- Ran all shell tests on Python 2.6 and Python 2.7

Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 70 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10939/13
-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 13
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 9: Code-Review+1

I'll give Nghia a chance to look too.


-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 23:25:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7259: Improve Impala shell performance

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10939 )

Change subject: IMPALA-7259: Improve Impala shell performance
......................................................................


Patch Set 8:

(1 comment)

Carry Mike's +1

http://gerrit.cloudera.org:8080/#/c/10939/7/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/10939/7/tests/shell/test_shell_commandline.py@31
PS7, Line 31: from time import sleep, time
            : from util import IMPA
> Sorry, you can do:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac9f3caed7c44846a8c922dbe5ca3bf3b095b81
Gerrit-Change-Number: 10939
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Jul 2018 17:53:59 +0000
Gerrit-HasComments: Yes