You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2023/05/16 15:11:30 UTC

[Impala-ASF-CR] WIP: Optimize delimited output

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19894


Change subject: WIP: Optimize delimited output
......................................................................

WIP: Optimize delimited output

Improvement for select * from tpch_parquet.lineitem:

python2 + hs2: 42s516ms -> 22s335ms
python2 + beeswax: 1m4s -> 22s126ms
python3 + hs2: 30s844ms -> 22s173ms
python3 + beeswax: 20s502ms -> 11s860ms

Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
---
M shell/shell_output.py
1 file changed, 62 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/19894/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13224/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 07 Jun 2023 16:53:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 6:

(4 comments)

Thanks for doing this optimization!

http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py
File shell/shell_output.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@105
PS6, Line 105:         or result.find(quote) != -1:
Currently we can't turn the try_quick_path flag back to True once it's turned off by error. It'd be nice to add an "unsafe" option for impala-shell to always (blindly) skip these checks. So we can know how well we can achieve in our best. On the other side, if users believe their strings are simple, they can use the "unsafe" option to futher boost the performance.


http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@123
PS6, Line 123:         pass
nit: don't need 'pass'


http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@149
PS6, Line 149:         self.try_quick_path = False
For supportability, we need to know when (and wound be perfect to have "why") this is turned off. Or just print it if verbose=true.


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1628
PS6, Line 1628: 
Can we add tests for decoding errors? We can use the queries in test_utf8_decoding_error_handling(), e.g.

  select unhex('aa')



-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 05 Jun 2023 09:14:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 8:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19894/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19894/6//COMMIT_MSG@16
PS6, Line 16: python2 + hs2: 42s516ms -> 22s335ms
> Optional: you could also include a percentage of performance improvement, i
I didn't want to add percentages as I cannot say something like "ClientFetchWaitTimer was improved 50% for Python3 + HS2" - this really depends on other factors like data types and number of columns/rows. This specific benchmarks show s that the improvement is significant, but I wouldn't want to extrapolate to other queries.


http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py
File shell/shell_output.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@102
PS6, Line 102: [
> Optional: You could use () instead of [], which would create a lazy iterato
Stayed with [] as I do not know lazy iterators that well. I tested it and the speed didn't change.


http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@105
PS6, Line 105:         or result.find(quote) != -1:
> Currently we can't turn the try_quick_path flag back to True once it's turn
I wouldn't complicate this part as I am considering actually implementing quoting in the future to handle all cases. The quoting rules are not too complex but would need robust testing to ensure that we always return the same results as Python's csv module.

The find/count functions add minimal overhead, the only one that is somewhat significant is decoding.

Note that DelimitedOutputFormatter is created per query, so if there are several queries the shell will retry the quick path in each query.


http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@123
PS6, Line 123: 
> nit: don't need 'pass'
Done


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@33
PS6, Line 33: try:
> Could we check the python version and branch on that instead of try-except?
This was copied from shell_output.py and I am not sure that the comments are always correct (is cStringIO always available on Python2?).


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1597
PS6, Line 1597:  
> Nit: result sets.
Done


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1599
PS6, Line 1599: .
> Nit: . (full stop).
Done


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1628
PS6, Line 1628: 
> Can we add tests for decoding errors? We can use the queries in test_utf8_d
Added a test and bumped into an issue with Hive: HIVE-27418 (and learned that with strict hs2 protocol the shell tests connect to Hive).



-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 07 Jun 2023 16:51:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Daniel Becker, Jason Fehr, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19894

to look at the new patch set (#8).

Change subject: IMPALA-12171: Optimize delimited output
......................................................................

IMPALA-12171: Optimize delimited output

The change adds a CSV generator that can handle simple cases
(no characters that need qouting) and falls back to Python's
builtin csv module if special characters are found.

Improvement of ClientFetchWaitTimer for
select * from tpch_parquet.lineitem:

python2 + hs2: 42s516ms -> 22s335ms
python2 + beeswax: 1m4s -> 22s126ms
python3 + hs2: 30s844ms -> 22s173ms
python3 + beeswax: 20s502ms -> 11s860ms

The different amount of improvement per protocol/Python version
probably comes from the varying amount utf-8 conversions that the
patch avoids. It seems that doing the conversion for the large
string after concatenation is much faster than doing it for a
lot of small strings.

Testing:
- added some shell tests with special characters and ran them

Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
---
M shell/shell_output.py
M tests/shell/test_shell_commandline.py
2 files changed, 123 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/19894/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Jason Fehr, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19894

to look at the new patch set (#6).

Change subject: IMPALA-12171: Optimize delimited output
......................................................................

IMPALA-12171: Optimize delimited output

The change adds a CSV generator that can handle simple cases
(no characters that need qouting) and falls back to Python's
builtin csv module if special characters are found.

Improvement of ClientFetchWaitTimer for
select * from tpch_parquet.lineitem:

python2 + hs2: 42s516ms -> 22s335ms
python2 + beeswax: 1m4s -> 22s126ms
python3 + hs2: 30s844ms -> 22s173ms
python3 + beeswax: 20s502ms -> 11s860ms

The different amount of improvement per protocol/Python version
probably comes from the varying amount utf-8 conversions that the
patch avoids. It seems that doing the conversion for the large
string after concatenation is much faster than doing it for a
lot of small strings.

Testing:
- added some shell tests with special characters and ran them

Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
---
M shell/shell_output.py
M tests/shell/test_shell_commandline.py
2 files changed, 114 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/19894/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 9:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13246/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 09 Jun 2023 18:06:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Daniel Becker, Jason Fehr, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19894

to look at the new patch set (#7).

Change subject: IMPALA-12171: Optimize delimited output
......................................................................

IMPALA-12171: Optimize delimited output

The change adds a CSV generator that can handle simple cases
(no characters that need qouting) and falls back to Python's
builtin csv module if special characters are found.

Improvement of ClientFetchWaitTimer for
select * from tpch_parquet.lineitem:

python2 + hs2: 42s516ms -> 22s335ms
python2 + beeswax: 1m4s -> 22s126ms
python3 + hs2: 30s844ms -> 22s173ms
python3 + beeswax: 20s502ms -> 11s860ms

The different amount of improvement per protocol/Python version
probably comes from the varying amount utf-8 conversions that the
patch avoids. It seems that doing the conversion for the large
string after concatenation is much faster than doing it for a
lot of small strings.

Testing:
- added some shell tests with special characters and ran them

Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
---
M shell/shell_output.py
M tests/shell/test_shell_commandline.py
M tests/shell/util.py
3 files changed, 129 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/19894/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: WIP: Optimize delimited output
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13058/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 15:36:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Jason Fehr (Code Review)" <ge...@cloudera.org>.
Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: WIP: Optimize delimited output
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 May 2023 16:58:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13143/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 May 2023 15:04:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Jason Fehr (Code Review)" <ge...@cloudera.org>.
Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 6: Code-Review+1

looks good to me!


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 May 2023 22:01:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: WIP: Optimize delimited output
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19894/2/shell/shell_output.py
File shell/shell_output.py:

http://gerrit.cloudera.org:8080/#/c/19894/2/shell/shell_output.py@86
PS2, Line 86:  
flake8: E222 multiple spaces after operator


http://gerrit.cloudera.org:8080/#/c/19894/2/shell/shell_output.py@86
PS2, Line 86:  
flake8: E272 multiple spaces before keyword


http://gerrit.cloudera.org:8080/#/c/19894/2/shell/shell_output.py@87
PS2, Line 87: .
flake8: E501 line too long (106 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19894/2/shell/shell_output.py@88
PS2, Line 88: s
flake8: E501 line too long (109 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19894/2/shell/shell_output.py@89
PS2, Line 89:  
flake8: E222 multiple spaces after operator


http://gerrit.cloudera.org:8080/#/c/19894/2/shell/shell_output.py@90
PS2, Line 90:  
flake8: E272 multiple spaces before keyword


http://gerrit.cloudera.org:8080/#/c/19894/2/shell/shell_output.py@94
PS2, Line 94: =
flake8: E225 missing whitespace around operator


http://gerrit.cloudera.org:8080/#/c/19894/2/shell/shell_output.py@95
PS2, Line 95:  
flake8: E222 multiple spaces after operator



-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 15:14:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19894

to look at the new patch set (#4).

Change subject: WIP: Optimize delimited output
......................................................................

WIP: Optimize delimited output

Improvement for select * from tpch_parquet.lineitem:

python2 + hs2: 42s516ms -> 22s335ms
python2 + beeswax: 1m4s -> 22s126ms
python3 + hs2: 30s844ms -> 22s173ms
python3 + beeswax: 20s502ms -> 11s860ms

Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
---
M shell/shell_output.py
1 file changed, 71 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/19894/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19894/7/tests/shell/util.py
File tests/shell/util.py:

http://gerrit.cloudera.org:8080/#/c/19894/7/tests/shell/util.py@345
PS7, Line 345: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/19894/7/tests/shell/util.py@346
PS7, Line 346: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/19894/7/tests/shell/util.py@348
PS7, Line 348: #
flake8: E265 block comment should start with '# '



-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 07 Jun 2023 16:32:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 10:

GVO failed in new test case


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Jun 2023 02:13:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Daniel Becker, Jason Fehr, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19894

to look at the new patch set (#9).

Change subject: IMPALA-12171: Optimize delimited output
......................................................................

IMPALA-12171: Optimize delimited output

The change adds a CSV generator that can handle simple cases
(no characters that need qouting) and falls back to Python's
builtin csv module if special characters are found.

Improvement of ClientFetchWaitTimer for
select * from tpch_parquet.lineitem:

python2 + hs2: 42s516ms -> 22s335ms
python2 + beeswax: 1m4s -> 22s126ms
python3 + hs2: 30s844ms -> 22s173ms
python3 + beeswax: 20s502ms -> 11s860ms

The different amount of improvement per protocol/Python version
probably comes from the varying amount utf-8 conversions that the
patch avoids. It seems that doing the conversion for the large
string after concatenation is much faster than doing it for a
lot of small strings.

Testing:
- added some shell tests with special characters and ran them

Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
---
M shell/impala_shell.py
M shell/shell_output.py
M tests/shell/test_shell_commandline.py
3 files changed, 133 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/19894/9
-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: WIP: Optimize delimited output
......................................................................


Patch Set 1:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py
File shell/shell_output.py:

http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@86
PS1, Line 86:  
flake8: E222 multiple spaces after operator


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@86
PS1, Line 86:  
flake8: E272 multiple spaces before keyword


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@87
PS1, Line 87: .
flake8: E501 line too long (106 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@88
PS1, Line 88: s
flake8: E501 line too long (109 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@89
PS1, Line 89:  
flake8: E222 multiple spaces after operator


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@90
PS1, Line 90:  
flake8: E272 multiple spaces before keyword


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@94
PS1, Line 94: =
flake8: E225 missing whitespace around operator


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@95
PS1, Line 95:  
flake8: E222 multiple spaces after operator


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@127
PS1, Line 127:  
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@127
PS1, Line 127:     return result 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@202
PS1, Line 202: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/19894/1/shell/shell_output.py@219
PS1, Line 219: o
flake8: E111 indentation is not a multiple of 2



-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 15:12:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: WIP: Optimize delimited output
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13089/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 May 2023 16:07:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 6:

(6 comments)

Thanks Csaba, a few remarks.

http://gerrit.cloudera.org:8080/#/c/19894/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19894/6//COMMIT_MSG@16
PS6, Line 16: python2 + hs2: 42s516ms -> 22s335ms
Optional: you could also include a percentage of performance improvement, it's easier to grasp the improvement that way.


http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py
File shell/shell_output.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@102
PS6, Line 102: [
Optional: You could use () instead of [], which would create a lazy iterator instead of a list.


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@33
PS6, Line 33: try:
Could we check the python version and branch on that instead of try-except?


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1597
PS6, Line 1597: that need special handling
We also test the simple case when there is no need for special handling, right?


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1597
PS6, Line 1597: s
Nit: result sets.


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1599
PS6, Line 1599: ,
Nit: . (full stop).



-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Jun 2023 09:16:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 9: Code-Review+1

(1 comment)

LGTM. I can bump to +2 if no more comments. Thanks for working on this!

http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py
File shell/shell_output.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@105
PS6, Line 105:         or result.count(delimiter) != expected_delimiters \
> I wouldn't complicate this part as I am considering actually implementing q
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 15 Jun 2023 00:39:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13142/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 May 2023 14:54:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9405/


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Jun 2023 01:02:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py
File shell/shell_output.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@149
PS6, Line 149:         return result
> For supportability, we need to know when (and wound be perfect to have "why
Added logs for fallback cases and tested it manually.



-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 09 Jun 2023 17:49:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: WIP: Optimize delimited output
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13057/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 15:43:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 9: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Jun 2023 19:29:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: WIP: Optimize delimited output
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13056/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 15:34:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19894/5/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/19894/5/tests/shell/test_shell_commandline.py@1608
PS5, Line 1608:  
flake8: E222 multiple spaces after operator



-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 May 2023 14:35:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 10: Code-Review+1

Thanks. It's ok for me after fixing the test failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Jun 2023 08:58:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19894

to look at the new patch set (#3).

Change subject: WIP: Optimize delimited output
......................................................................

WIP: Optimize delimited output

Improvement for select * from tpch_parquet.lineitem:

python2 + hs2: 42s516ms -> 22s335ms
python2 + beeswax: 1m4s -> 22s126ms
python3 + hs2: 30s844ms -> 22s173ms
python3 + beeswax: 20s502ms -> 11s860ms

Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
---
M shell/shell_output.py
1 file changed, 61 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/19894/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] WIP: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19894

to look at the new patch set (#2).

Change subject: WIP: Optimize delimited output
......................................................................

WIP: Optimize delimited output

Improvement for select * from tpch_parquet.lineitem:

python2 + hs2: 42s516ms -> 22s335ms
python2 + beeswax: 1m4s -> 22s126ms
python3 + hs2: 30s844ms -> 22s173ms
python3 + beeswax: 20s502ms -> 11s860ms

Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
---
M shell/shell_output.py
1 file changed, 59 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/19894/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13225/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 07 Jun 2023 16:56:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9405/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Jun 2023 19:39:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12171: Optimize delimited output

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Jason Fehr, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19894

to look at the new patch set (#5).

Change subject: IMPALA-12171: Optimize delimited output
......................................................................

IMPALA-12171: Optimize delimited output

The change adds a CSV generator that can handle simple cases
(no characters that need qouting) and falls back to Python's
builtin csv module if special characters are found.

Improvement of ClientFetchWaitTimer for
select * from tpch_parquet.lineitem:

python2 + hs2: 42s516ms -> 22s335ms
python2 + beeswax: 1m4s -> 22s126ms
python3 + hs2: 30s844ms -> 22s173ms
python3 + beeswax: 20s502ms -> 11s860ms

The different amount of improvement per protocol/Python version
probably comes from the varying amount utf-8 conversions that the
patch avoids. It seems that doing the conversion for the large
string after concatenation is much faster than doing it for a
lot of small strings.

Testing:
- added some shell tests with special characters and ran them

Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
---
M shell/shell_output.py
M tests/shell/test_shell_commandline.py
2 files changed, 114 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/19894/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>