You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Andrew Sherman (Code Review)" <ge...@cloudera.org> on 2019/03/20 17:39:08 UTC

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Andrew Sherman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12812


Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................

IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

This change fixes a regression introduced by "IMPALA-2195 Improper
handling of comments in queries."

The Impala Shell parses input text into several strings using the
sqlparse library. One of the returned strings is the sql command, this
is used to determine the correct do_<command> method to call. Another of
the returned strings is the leading comment, which is a comment that
appears before legal sql text.

Python2 has strings with multiple encodings. The strings returned from
the sqlparse library have the Unicode encoding. Impala Shell converts
the sql command string to utf-8 encoding before using it.

If the Impala Shell needs to send the sql command to an Impala
Coordinator then it  (re)constructs the query out of the strings
returned by the sqlparse library. This query is sent to the Coordinator
via Beeswax protocol. The query is converted to an ascii string before
being sent. The conversion can fail if the leading comment string
contains Unicode characters, which can't be directly converted to ascii.
So the trigger for the bug is that the leading comment contains Unicode.

The fix is that the  leading comment string should be converted to utf-8
in the same way as the sql command.

TESTING:

Ran all end -to-end tests.
Added two test cases to tests/shell/test_shell_interactive.py

Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
---
M shell/impala_shell.py
M tests/shell/test_shell_interactive.py
2 files changed, 8 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/12812/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2486/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Mar 2019 18:22:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................

IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

This change fixes a regression introduced by "IMPALA-2195 Improper
handling of comments in queries."

The Impala Shell parses input text into several strings using the
sqlparse library. One of the returned strings is the sql command, this
is used to determine the correct do_<command> method to call. Another of
the returned strings is the leading comment, which is a comment that
appears before legal sql text.

Python2 has strings with multiple encodings. The strings returned from
the sqlparse library have the Unicode encoding. Impala Shell converts
the sql command string to utf-8 encoding before using it.

If the Impala Shell needs to send the sql command to an Impala
Coordinator then it (re)constructs the query out of the strings
returned by the sqlparse library. This query is sent to the Coordinator
via Beeswax protocol. The query is converted to an ascii string before
being sent. The conversion can fail if the leading comment string
contains Unicode characters, which can't be directly converted to ascii.
So the trigger for the bug is that the leading comment contains Unicode.

The fix is that the leading comment string should be converted to utf-8
in the same way as the sql command.

TESTING:

Ran all end -to-end tests.
Added two test cases to tests/shell/test_shell_interactive.py

Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
---
M shell/impala_shell.py
M tests/shell/test_shell_interactive.py
2 files changed, 8 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/12812/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................


Patch Set 1:

(4 comments)

Thanks Fredy

http://gerrit.cloudera.org:8080/#/c/12812/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12812/1//COMMIT_MSG@23
PS1, Line 23:  
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/12812/1//COMMIT_MSG@30
PS1, Line 30:  
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/12812/1/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/12812/1/shell/impala_shell.py@1339
PS1, Line 1339: (leading_comment)
> nit: unnecessary parentheses in python
Thanks


http://gerrit.cloudera.org:8080/#/c/12812/1/shell/impala_shell.py@1340
PS1, Line 1340: leading_comment.encode('utf-8')
> I missed this in my previous patch :( Thanks for the fix!
Thanks for the quick review!



-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Mar 2019 18:04:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2484/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Mar 2019 18:07:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................

IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

This change fixes a regression introduced by "IMPALA-2195 Improper
handling of comments in queries."

The Impala Shell parses input text into several strings using the
sqlparse library. One of the returned strings is the sql command, this
is used to determine the correct do_<command> method to call. Another of
the returned strings is the leading comment, which is a comment that
appears before legal sql text.

Python2 has strings with multiple encodings. The strings returned from
the sqlparse library have the Unicode encoding. Impala Shell converts
the sql command string to utf-8 encoding before using it.

If the Impala Shell needs to send the sql command to an Impala
Coordinator then it (re)constructs the query out of the strings
returned by the sqlparse library. This query is sent to the Coordinator
via Beeswax protocol. The query is converted to an ascii string before
being sent. The conversion can fail if the leading comment string
contains Unicode characters, which can't be directly converted to ascii.
So the trigger for the bug is that the leading comment contains Unicode.

The fix is that the leading comment string should be converted to utf-8
in the same way as the sql command.

TESTING:

Ran all end -to-end tests.
Added two test cases to tests/shell/test_shell_interactive.py

Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Reviewed-on: http://gerrit.cloudera.org:8080/12812
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M shell/impala_shell.py
M tests/shell/test_shell_interactive.py
2 files changed, 8 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3928/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Mar 2019 18:07:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Mar 2019 18:06:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................


Patch Set 1: Code-Review+2

(4 comments)

Just couple nits, but LGTM. Thanks for fixing this!

http://gerrit.cloudera.org:8080/#/c/12812/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12812/1//COMMIT_MSG@23
PS1, Line 23:  
nit: extra space


http://gerrit.cloudera.org:8080/#/c/12812/1//COMMIT_MSG@30
PS1, Line 30:  
nit: extra space


http://gerrit.cloudera.org:8080/#/c/12812/1/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/12812/1/shell/impala_shell.py@1339
PS1, Line 1339: (leading_comment)
nit: unnecessary parentheses in python


http://gerrit.cloudera.org:8080/#/c/12812/1/shell/impala_shell.py@1340
PS1, Line 1340: leading_comment.encode('utf-8')
I missed this in my previous patch :( Thanks for the fix!



-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Mar 2019 17:41:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Mar 2019 22:18:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8325: Leading Unicode comments cause Impala Shell failure.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12812 )

Change subject: IMPALA-8325: Leading Unicode comments cause Impala Shell failure.
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8633935b6e0ca33594afd32ad242779555e09944
Gerrit-Change-Number: 12812
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Mar 2019 18:07:10 +0000
Gerrit-HasComments: No