You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "David Knupp (Code Review)" <ge...@cloudera.org> on 2020/01/29 22:22:42 UTC

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

David Knupp has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15132


Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: WIP - Make impala-shell compatible with python 3.

This is a WIP patch for making the impala-shell code cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that
will pass all python e2e tests repsective of the version of python used to
launch the shell, under the assumption that the test framework itself will
continue to run with python 2.7.x, irrespective of the shell version being
tested.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
15 files changed, 463 insertions(+), 309 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 3
Gerrit-Owner: David Knupp <dk...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5323/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 13
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 02:35:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 12:

(4 comments)

I did a pass over the change and had some high level questions/suggestions. I haven't focused yet on the string-handling part.

http://gerrit.cloudera.org:8080/#/c/15132/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15132/12//COMMIT_MSG@39
PS12, Line 39:   USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py
Is thrift 11 a requirement for python 3 support?


http://gerrit.cloudera.org:8080/#/c/15132/12//COMMIT_MSG@115
PS12, Line 115:   What ultimately seemed like a better approach was to try to weed out as many
Is there any way to document or enforce that the strings in the code are a particular type of string? It might help at least to ensure that some parts of the shell code are clean.


http://gerrit.cloudera.org:8080/#/c/15132/12//COMMIT_MSG@116
PS12, Line 116:   existing spurious str.encode() and str.decode() calls as I could. I'm not sure
Also, do you have any references to best practices or other info about python string handling?


http://gerrit.cloudera.org:8080/#/c/15132/12/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/12/shell/impala_shell.py@51
PS12, Line 51: try:
Would it be worth factoring out some of these compatibility patterns into a separate file? E.g. you could import like this.

from compat import xrange, basestring

It might be overkill, but it did occur to me while reading through



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 12
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Feb 2020 17:03:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 5:

> Patch Set 5: Verified-1
> 
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5478/

Failures were in Java front-end tests. All of the python e2e shell tests passed:
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/9874/testReport/

I'm clearing the -1, and will re-submit after cleaning up a few more issues.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 Jan 2020 18:29:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5502/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 19
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Mar 2020 03:16:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_client.py@669
PS3, Line 669:       print('{0} {1}'.format(str(e), type(e)), file=sys.stderr)
> Python 2.6, the default in RHEL 6, doesn't support formatting without posit
Done


http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_shell.py@342
PS3, Line 342: 
> flake8: E501 line too long (91 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_shell.py@529
PS3, Line 529:  
> flake8: E226 missing whitespace around arithmetic operator
Done


http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_shell.py@844
PS3, Line 844: 
> flake8: F841 local variable 'e' is assigned to but never used
Done


http://gerrit.cloudera.org:8080/#/c/15132/4/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/4/shell/impala_shell.py@341
PS4, Line 341:  
> flake8: E129 visually indented line with same indent as next logical line
Done


http://gerrit.cloudera.org:8080/#/c/15132/3/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

http://gerrit.cloudera.org:8080/#/c/15132/3/tests/shell/test_shell_interactive.py@411
PS3, Line 411:       assert history_entry in result.stderr, "'%s' not in '%s'" % (history_entry,
> Why the change to stdout?
Done.

Admittedly, I got a little sloppy here. I missed a print_to_stderr call in one spot, noticed that the test failed as written, but that it passed with stdout. I meant to go back and track down what happened, but just lost track of it.



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 21:27:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5478/


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 Jan 2020 02:51:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Grant Henke, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#8).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This is patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests repsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x, irrespective of the shell version being tested.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
15 files changed, 492 insertions(+), 315 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/8
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 8
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This is patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests repsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x, irrespective of the shell version being tested.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
15 files changed, 491 insertions(+), 313 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/7
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 7
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 10:

(3 comments)

OK, this is really the last revision to get it ready for final review. I was
so focused on getting the minicluster tests to pass that I neglected to check kerberized connections. minicluster tests are passing, and I've confirmed that
I can connect to both a kerberized cluster, and LDAP-protected cluster over
HTTP.

http://gerrit.cloudera.org:8080/#/c/15132/9/shell/thrift_sasl.py
File shell/thrift_sasl.py:

http://gerrit.cloudera.org:8080/#/c/15132/9/shell/thrift_sasl.py@27
PS9, Line 27: 
> flake8: E501 line too long (97 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/15132/9/shell/thrift_sasl.py@62
PS9, Line 62: 
> flake8: E261 at least two spaces before inline comment
Done


http://gerrit.cloudera.org:8080/#/c/15132/9/shell/thrift_sasl.py@64
PS9, Line 64: 
> flake8: E261 at least two spaces before inline comment
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 10
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 20:51:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Grant Henke, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#11).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests irrepsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x, irrespective of the shell version being tested.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
17 files changed, 535 insertions(+), 339 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/11
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 11
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/13/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/13/shell/impala_shell.py@25
PS13, Line 25: from compatibility import _basestring as basestring
flake8: F401 'compatibility._basestring as basestring' imported but unused



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 13
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 01:50:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has abandoned this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Abandoned

Abandoning to breakup the patch into smaller more easily reviewed chunks.
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 20
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5325/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 15
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 02:55:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/7/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/15132/7/tests/shell/test_shell_commandline.py@1012
PS7, Line 1012: 
> flake8: W391 blank line at end of file
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 8
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 19:12:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/7/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/15132/7/tests/shell/test_shell_commandline.py@1012
PS7, Line 1012: 
flake8: W391 blank line at end of file



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 7
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Feb 2020 22:40:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15132/3//COMMIT_MSG@124
PS3, Line 124:   No effort has been made yet to come up with a way to integrate testing of the
I think IMPALA-8508 could help here by shipping a version of python 3 in the toolchain. We could run impala-shell using that python.



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 3
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jan 2020 23:40:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5482/


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 19
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Mar 2020 11:50:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py@485
PS15, Line 485:       if SHELL_IS_PYTHON_2:
> Still digging into this. The proposed change of cname.decode('utf8') fails 
Just a note to let you know this hasn't been languishing. I'm stuck in a weird loop that I'm still trying to figure out.

It turns out that your suggestion was definitely on the right path after all. What was misleading We seem to need to call decode('utf-8') in python 2, and forego in python 3. I also needed to revert a change I had made to https://github.com/apache/impala/blob/master/shell/impala_shell.py#L1221 -- essentially, we need to do the same thing. I've confirmed that with those changes, on both py2 and py3, the intended behavior works. Here's an example with a smaller set of chars.

  $ impala-shell.sh -q "select '?, ?, ?, ?, ?, ?, ?'"
  Starting Impala with no authentication using Python 2.7.12
  Opened TCP connection to localhost:21000
  Connected to localhost:21000
  Server version: impalad version 3.4.0-SNAPSHOT DEBUG (build b0634b1341f6189b8c7e0c1ba4cce810e947c52a)
  Query: select '?, ?, ?, ?, ?, ?, ?'
  Query submitted at: 2020-03-06 11:26:31 (Coordinator: http://dknupp-desktop:25000)
  Query progress can be monitored at: http://dknupp-desktop:25000/query_plan? 
  query_id=b54072afd6be4181:9ea5acc200000000
  +-----------------------+
  | '?, ?, ?, ?, ?, ?, ?' |
  +-----------------------+
  | ?, ?, ?, ?, ?, ?, ?   |
  +-----------------------+
  Fetched 1 row(s) in 0.12s

And it works interactively.

  $ impala-shell.sh
  Starting Impala with no authentication using Python 2.7.12
  Opened TCP connection to localhost:21000
  Connected to localhost:21000
  Server version: impalad version 3.4.0-SNAPSHOT DEBUG (build b0634b1341f6189b8c7e0c1ba4cce810e947c52a)
  ***********************************************************************************
  Welcome to the Impala shell.
  (Impala Shell v3.4.0-SNAPSHOT (b0634b1) built on Fri Feb 21 13:49:48 PST 2020)

  Press TAB twice to see a list of available commands.
  ***********************************************************************************
  [localhost:21000] default> select '?, ?, ?, ?, ?, ?, ?';
  Query: select '?, ?, ?, ?, ?, ?, ?'
  Query submitted at: 2020-03-06 11:31:23 (Coordinator: http://dknupp-desktop:25000)
  Query progress can be monitored at: http://dknupp-desktop:25000/query_plan? 
  query_id=8b4b4528051650f5:9520650700000000
  +-----------------------+
  | '?, ?, ?, ?, ?, ?, ?' |
  +-----------------------+
  | ?, ?, ?, ?, ?, ?, ?   |
  +-----------------------+
  Fetched 1 row(s) in 0.12s
  [localhost:21000] default>

However (and ignoring the added complication of tabs being present), test_international_characters_prettyprint is now failing with a UnicodeDecodeError. On the other hand, that test is passing on my original version -- but then actual usage fails. It's weird.



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 17
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Mar 2020 19:33:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15132/3//COMMIT_MSG@124
PS3, Line 124:   No effort has been made yet to come up with a way to integrate testing of the
That's a good point.

> There's a part of me that wonders, since we're providing a publicly pip-installable tarball on PyPI, is there even a need to keep generating the tarball? Could using the locally available tarball simply be replaced with "pip install impala-shell"? (Though I'm also sure there are ramifications to doing that that are not occurring to me.)

I can think of two reasons (not sure if they're good reasons, but they are reasons)
* someone doesn't like using pip for some reason (the python packaging ecosystem is pretty fragmented).
* someone has other infrastructure that consumes the tarball, e.g. deployment scripts, dockerfiles, packaging scripts.

I believe my employer has some of those that wouldn't be easily convertible to use a different way of packaging the shell. That's no real reason for the Apache project to keep the maintenance burden, but it wouldn't be surprising if there were more consumers out there.



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 3
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 22:09:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15132/15/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/shell/impala_shell.py@962
PS15, Line 962:     if flush is True:
nit: "if flush" might be more pythonic


http://gerrit.cloudera.org:8080/#/c/15132/15/shell/thrift_sasl.py
File shell/thrift_sasl.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/shell/thrift_sasl.py@23
PS15, Line 23: # Initially copied from the Impala repo
Comment probably needs updating?


http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py@485
PS15, Line 485:       if SHELL_IS_PYTHON_2:
I don't think I quite understand this - based on the control flow it seems like this is breaking on beeswax, not HS2 (the assert below should only raise an exception if this fails on beeswax).


http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py@1004
PS15, Line 1004: he the
nit: repeated word



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 15
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 19:46:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5361/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 17
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Feb 2020 02:33:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/4/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/4/shell/impala_shell.py@341
PS4, Line 341:  
flake8: E129 visually indented line with same indent as next logical line



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 4
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 21:19:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Grant Henke, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#10).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This is patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests repsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x, irrespective of the shell version being tested.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
17 files changed, 535 insertions(+), 339 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/10
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 10
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#16).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests irrepsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
A shell/compatibility.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
18 files changed, 612 insertions(+), 339 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/16
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 16
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 5:

> Patch Set 5:
> 
> (1 comment)

Bad wording made that last question hard to parse. Sorry. Should be:

"...is there even a need to keep generating the local build/tarball?"


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 22:05:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py@485
PS15, Line 485:       if SHELL_IS_PYTHON_2:
> I got confused by the scenario, but I checked this out and played around an
Still digging into this. The proposed change of cname.decode('utf8') fails under python 3 because with python 3:

  'str' object has no attribute 'decode'

In the meantime, with that change, the test against the built tarball still fails:

E   Query: select '?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,?, ?, ?, ?, ?, ?,?, ?, ?, ?, ?, ?, ?, ?, ?\t'
  E   Query submitted at: 2020-03-02 14:06:17 (Coordinator: http://dknupp-desktop:25000)
  E   Query progress can be monitored at: http://dknupp-desktop:25000/query_plan?query_id=c649f690ad99d702:8571f57400000000
  E   Unknown Exception : 'ascii' codec can't encode character u'\u0430' in position 107: ordinal not in range(128)
  E   Traceback (most recent call last):
  E     File "/home/dknupp/Impala/shell/build/impala-shell-3.4.0-SNAPSHOT/impala_shell.py", line 1934, in <module>
  E       impala_shell_main()
  E     File "/home/dknupp/Impala/shell/build/impala-shell-3.4.0-SNAPSHOT/impala_shell.py", line 1888, in impala_shell_main
  E       if execute_queries_non_interactive_mode(options, query_options):
  E     File "/home/dknupp/Impala/shell/build/impala-shell-3.4.0-SNAPSHOT/impala_shell.py", line 1697, in execute_queries_non_interactive_mode
  E       shell.execute_query_list(queries))
  E     File "/home/dknupp/Impala/shell/build/impala-shell-3.4.0-SNAPSHOT/impala_shell.py", line 1531, in execute_query_list
  E       print('Could not execute command: %s' % q, file=sys.stderr)
  E   UnicodeEncodeError: 'ascii' codec can't encode character u'\u0410' in position 35: ordinal not in range(128)



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 17
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Mar 2020 22:09:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#14).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests irrepsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
A shell/compatibility.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
18 files changed, 607 insertions(+), 339 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/14
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 14
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 12:

(4 comments)

Patch 13 will address all of the comments on patch set 12.

http://gerrit.cloudera.org:8080/#/c/15132/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15132/12//COMMIT_MSG@39
PS12, Line 39:   USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py
> Is thrift 11 a requirement for python 3 support?
Thrift >= 11 is required for python 3, and so USE_THRIFT11_GEN_PY must be true when we build the package or test the shell in a python3 environment. In fact, we really need to enforce this, not just take it on faith, so my next patch will address that.


http://gerrit.cloudera.org:8080/#/c/15132/12//COMMIT_MSG@115
PS12, Line 115:   What ultimately seemed like a better approach was to try to weed out as many
> Is there any way to document or enforce that the strings in the code are a 
I'm not sure about this -- there's probably some more cleanup that we could do. I initially tried to follow the technique of the so-called "unicode sandwich" (decode early, encode at the edge), but I was still seeing some failures. It's possible I missed something, or that we have an inherently unstable situation, testing with python 2.x in all cases, but against a shell that might be running with either python 2 or 3.

I'm open to trying out any specific suggestions you might have.


http://gerrit.cloudera.org:8080/#/c/15132/12//COMMIT_MSG@116
PS12, Line 116:   existing spurious str.encode() and str.decode() calls as I could. I'm not sure
> Also, do you have any references to best practices or other info about pyth
Yeah -- there's almost so much information online on this topic that it's hard to sift through.

This seems like a good place to start:
https://medium.com/better-programming/strings-unicode-and-bytes-in-python-3-everything-you-always-wanted-to-know-27dc02ff2686

This slide deck is frequently referenced:
https://nedbatchelder.com/text/unipain/unipain.html#1


http://gerrit.cloudera.org:8080/#/c/15132/12/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/12/shell/impala_shell.py@51
PS12, Line 51: try:
> Would it be worth factoring out some of these compatibility patterns into a
I gave it a shot. Turns out that redefining built-ins outright is allowed, but importing is not, so I had to do a bit of renaming.



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 12
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 01:49:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5161/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 8
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 20:01:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/19/bin/impala-shell.sh
File bin/impala-shell.sh:

http://gerrit.cloudera.org:8080/#/c/15132/19/bin/impala-shell.sh@55
PS19, Line 55: PYTHONPATH=${PYTHONPATH} exec "${IMPALA_PYTHON_EXECUTABLE}" ${SHELL_HOME}/impala_shell.py "$@"
line too long (94 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 19
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Mar 2020 02:32:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5558/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 22:12:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15132/3//COMMIT_MSG@124
PS3, Line 124:   No effort has been made yet to come up with a way to integrate testing of the
> It's also possible that python 3 is already installed on the workers that r
Bad wording made that last question hard to parse. Sorry. Should be:

"...is there even a need to keep generating the local build/tarball?"



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 22:06:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Grant Henke, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#4).

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: WIP - Make impala-shell compatible with python 3.

This is a WIP patch for making the impala-shell code cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that
will pass all python e2e tests repsective of the version of python used to
launch the shell, under the assumption that the test framework itself will
continue to run with python 2.7.x, irrespective of the shell version being
tested.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
15 files changed, 463 insertions(+), 310 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 4
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 3:

(2 comments)

Just passing through with a quick glance review since I did this for Kudu recently.

http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_client.py@669
PS3, Line 669:       print('{} {}'.format(str(e), type(e)), file=sys.stderr)
Python 2.6, the default in RHEL 6, doesn't support formatting without positions. Using '{0} {1}' is most compatible.
https://docs.python.org/2/library/string.html#format-string-syntax


http://gerrit.cloudera.org:8080/#/c/15132/3/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

http://gerrit.cloudera.org:8080/#/c/15132/3/tests/shell/test_shell_interactive.py@411
PS3, Line 411:       assert history_entry in result.stdout, "'%s' not in '%s'" % (history_entry,
Why the change to stdout?



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 3
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jan 2020 23:20:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/20/bin/impala-shell.sh
File bin/impala-shell.sh:

http://gerrit.cloudera.org:8080/#/c/15132/20/bin/impala-shell.sh@55
PS20, Line 55: PYTHONPATH=${PYTHONPATH} exec "${IMPALA_PYTHON_EXECUTABLE}" ${SHELL_HOME}/impala_shell.py "$@"
line too long (94 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 20
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Mar 2020 16:55:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/13/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/13/shell/impala_shell.py@25
PS13, Line 25: 
> flake8: F401 'compatibility._basestring as basestring' imported but unused
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 14
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 01:54:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#13).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests irrepsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
A shell/compatibility.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
18 files changed, 608 insertions(+), 339 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/13
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 13
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has removed a vote on this change.

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Removed Verified-1 by Impala Public Jenkins <im...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5162/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 9
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 21:11:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5542/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 3
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jan 2020 23:07:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15132/3//COMMIT_MSG@124
PS3, Line 124:   No effort has been made yet to come up with a way to integrate testing of the
> I think IMPALA-8508 could help here by shipping a version of python 3 in th
It's also possible that python 3 is already installed on the workers that run our tests.

There's a part of me that wonders, since we're providing a publicly pip-installable tarball on PyPI, is there even a need to keep generating the tarball? Could using the locally available tarball simply be replaced with "pip install impala-shell"? (Though I'm also sure there are ramifications to doing that that are not occurring to me.)



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 21:33:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5163/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 10
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 21:18:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#15).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests irrepsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
A shell/compatibility.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
18 files changed, 612 insertions(+), 339 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/15
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 15
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5324/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 14
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 02:40:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Patch Set 20: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5485/


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 20
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Mar 2020 21:48:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 7:

I'm considering this patch ready for final review.

A couple of the shell tests won't totally work in a fully py2/py3 cross compatible way, and they were noted in the test files with comments.
These are:

- TestImpalaShellInteractive.test_fix_infinite_loop
- TestImpalaShell.test_international_characters_prettyprint_tabs

The next major shell patch should be to upgrade sqlparse. I filed IMPALA-9362
to track that work.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 7
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Feb 2020 22:47:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Grant Henke, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#12).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests irrepsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
17 files changed, 535 insertions(+), 339 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/12
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 12
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

This patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests irrepsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python
  environment independenty from bin/set-pythonpath.sh. (See IMPALA-9489)

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could, and try to
  implement what is has colloquially been called a "unicode sandwich" -- namely,
  "bytes on the outside, unicode on the inside, encode/decode at the edges."

- from __future__ import print_function, unicode_literals was added to each
  relevant impala-shell related file

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/impala-shell.sh
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
A shell/compatibility.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/pkg_resources.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
20 files changed, 680 insertions(+), 357 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/19
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 19
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py@485
PS15, Line 485:       if SHELL_IS_PYTHON_2:
> Just a note to let you know this hasn't been languishing. I'm stuck in a we
Ah, and it's failing in the shell code, not actually in the test code.



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 17
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Mar 2020 19:45:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15132/15/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/shell/impala_shell.py@962
PS15, Line 962:     if flush:
> nit: "if flush" might be more pythonic
Done


http://gerrit.cloudera.org:8080/#/c/15132/15/shell/thrift_sasl.py
File shell/thrift_sasl.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/shell/thrift_sasl.py@23
PS15, Line 23: # Initially copied from the Impala repo
> Comment probably needs updating?
I think I meant the thrift_sasl repo, and I just made a mistake? In any event, this file will get a wholesale replacement soon, since thrift_sasl.py has been updated upstream. I'll wait until IMPALA-9424 (https://gerrit.cloudera.org/c/15293/) is merged, and then I'll replace this file here.


http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py@485
PS15, Line 485:       if SHELL_IS_PYTHON_2:
> I don't think I quite understand this - based on the control flow it seems 
The primary assert happens above this line, regardless of python version.

    if protocol == 'beeswax':
      assert 'Reverting to tab delimited text' in result.stderr

So, if protocol is beeswax, assert as expected.
Else if protocol is NOT beeswax, check if SHELL_IS_PYTHON_2, and if so, just skip.
However, if protocol is NOT beeswax, but shell is python 3 (or, presumably, above 3 some day) then proceed with test to the end.

Please let me know if I should re-order this to make it more legible.

Honestly, this comment almost makes me think we should just skip altogether if not beeswax.

  # HS2 does not need to fall back, but should behave appropriately.


http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py@1004
PS15, Line 1004: he
> nit: repeated word
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 16
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Feb 2020 00:42:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/19/bin/impala-shell.sh
File bin/impala-shell.sh:

http://gerrit.cloudera.org:8080/#/c/15132/19/bin/impala-shell.sh@55
PS19, Line 55: PYTHONPATH=${PYTHONPATH} exec "${IMPALA_PYTHON_EXECUTABLE}" ${SHELL_HOME}/impala_shell.py "$@"
> line too long (94 > 90)
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 19
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Mar 2020 07:56:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5339/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 16
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Feb 2020 01:29:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Grant Henke, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#9).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This is patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests repsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x, irrespective of the shell version being tested.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
17 files changed, 530 insertions(+), 341 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/9
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 9
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5485/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 20
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Mar 2020 16:54:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5557/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 4
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 22:06:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5482/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 19
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Mar 2020 07:07:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Grant Henke, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#5).

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: WIP - Make impala-shell compatible with python 3.

This is a WIP patch for making the impala-shell code cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that
will pass all python e2e tests repsective of the version of python used to
launch the shell, under the assumption that the test framework itself will
continue to run with python 2.7.x, irrespective of the shell version being
tested.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
15 files changed, 463 insertions(+), 310 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#20).

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

This patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests irrepsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python
  environment independenty from bin/set-pythonpath.sh. (See IMPALA-9489)

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could, and try to
  implement what is has colloquially been called a "unicode sandwich" -- namely,
  "bytes on the outside, unicode on the inside, encode/decode at the edges."

- from __future__ import print_function, unicode_literals was added to each
  relevant impala-shell related file

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/impala-shell.sh
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
A shell/compatibility.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/pkg_resources.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
20 files changed, 684 insertions(+), 362 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/20
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 20
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5509/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 20
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Mar 2020 17:38:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5144/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 7
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Feb 2020 23:26:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_shell.py@342
PS3, Line 342: )
flake8: E501 line too long (91 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_shell.py@529
PS3, Line 529: -
flake8: E226 missing whitespace around arithmetic operator


http://gerrit.cloudera.org:8080/#/c/15132/3/shell/impala_shell.py@844
PS3, Line 844: e
flake8: F841 local variable 'e' is assigned to but never used


http://gerrit.cloudera.org:8080/#/c/15132/3/shell/shell_output.py
File shell/shell_output.py:

http://gerrit.cloudera.org:8080/#/c/15132/3/shell/shell_output.py@23
PS3, Line 23: import os
flake8: F401 'os' imported but unused



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 3
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jan 2020 22:23:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Grant Henke, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#6).

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: WIP - Make impala-shell compatible with python 3.

This is a WIP patch for making the impala-shell code cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that
will pass all python e2e tests repsective of the version of python used to
launch the shell, under the assumption that the test framework itself will
continue to run with python 2.7.x, irrespective of the shell version being
tested.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell on Python 3.7.5 with no authentication
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell on Python 2.7.12 with no authentication
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
15 files changed, 463 insertions(+), 310 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/6
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 6
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15132/9/shell/thrift_sasl.py
File shell/thrift_sasl.py:

http://gerrit.cloudera.org:8080/#/c/15132/9/shell/thrift_sasl.py@27
PS9, Line 27: n
flake8: E501 line too long (97 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/15132/9/shell/thrift_sasl.py@62
PS9, Line 62:  
flake8: E261 at least two spaces before inline comment


http://gerrit.cloudera.org:8080/#/c/15132/9/shell/thrift_sasl.py@64
PS9, Line 64:  
flake8: E261 at least two spaces before inline comment



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 9
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 20:26:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15132

to look at the new patch set (#17).

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................

IMPALA-3343: Make impala-shell compatible with python 3.

This patch makes the impala-shell code cross-compatible with python 2 and
python 3. The goal is wind up with a version of the shell that will pass
python e2e tests irrepsective of the version of python used to launch the shell,
under the assumption that the test framework itself will continue to run with
python 2.7.x.

There are a few isolated tests that weren't able to pass under both versions,
and the reasons have been documented in comments in the test themselves.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- Many of the simpler changes derive from the fact that a few built-in functions
  and/or types have either been removed or have else changed in python 3.x,
  E.g., xrange and basestring are both gone, dict.iteritems() has been removed,
  dict.items() behaves differently, the unicode() function and the method
  str.decode() have both been removed, etc.

  Also, catching exceptions using "Exception, e" no longer works, and (as most
  know), using print() as a function is required now.

- A new pytest command line option was added in conftest.py to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- $IMPALA_HOME/bin/set-pythonpath.sh was updated to properly use the thrift-11
  gen-py files if USE_THRIFT11_GEN_PY is set to "true". This is required for
  testing a version of the impala-shell in a python3-based virtualenv.

- thrift_sasl.py was updated to match the current public alpha, 0.4a1

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000
    ...

    OR

    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000
    ...

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')
  True

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  True
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  True
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?
  False

  As a result of this, the way we handled strings in the impala-shell code had
  become equally loose and inconsistent -- mainly in the form of frequent and
  liberal use of str.encode() and str.decode() -- but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  ['foo']

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  At first, I tried to go one-by-one to the site of each failure, and correct
  by checking instance type and re-encoding as necessary, but this only led to
  even more str.encode() calls littering the code, which just seemed like a
  code-smell. (Wiki "code smell" if you don't know the term.)

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could. I'm not sure
  I got it as clean as it could be, but I think what I'm submitting is much
  cleaner than the original code.

Testing:

  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and ran the tests against that.

  In some cases, not all of the tests pass under python3. These have been marked
  with a TODO comment. In one case, there was regression with regard to the
  standard python2 shell (although, funnily, in this latter case, the test
  passes in a python3 environment).

  No effort has been made yet to come up with a way to integrate testing of the
  shell in a python3 environment into our automated test processes.

Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
---
M bin/set-pythonpath.sh
M infra/python/deps/compiled-requirements.txt
M shell/TSSLSocketWithWildcardSAN.py
A shell/compatibility.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/packaging/make_python_package.sh
M shell/packaging/requirements.txt
M shell/packaging/setup.py
M shell/shell_output.py
M shell/thrift_sasl.py
M tests/conftest.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
18 files changed, 611 insertions(+), 338 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/15132/17
-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 17
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3343: Make impala-shell compatible with python 3.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: Make impala-shell compatible with python 3.
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/15132/15/tests/shell/test_shell_commandline.py@485
PS15, Line 485:       if SHELL_IS_PYTHON_2:
> The primary assert happens above this line, regardless of python version.
I got confused by the scenario, but I checked this out and played around and understand now.

I think this regression is kinda bad, this would affect anyone using non-ascii characters with HS2. I think if we don't fix it it's important to check that the fallback logic also works for HS2, since that's the behaviour in this patchset (so the test shouldn't be skipped either way).

FWIW I played around to understand and it looks like the issue on python2 is fixed by this, at least when I'm running impala-shell.sh

+    column_names = [cname.decode('utf8') for cname in self.imp_client.get_column_names(self.last_query_handle)]
-    column_names = self.imp_client.get_column_names(self.last_query_handle)

   impala-shell.sh -q 'select "?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,"'



-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 15
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Feb 2020 22:11:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3343: WIP - Make impala-shell compatible with python 3.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15132 )

Change subject: IMPALA-3343: WIP - Make impala-shell compatible with python 3.
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5478/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/15132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb75e162bac0faeae3e12106c15da39cbfb8b462
Gerrit-Change-Number: 15132
Gerrit-PatchSet: 5
Gerrit-Owner: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 22:02:36 +0000
Gerrit-HasComments: No