You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "David Knupp (Jira)" <ji...@apache.org> on 2020/03/19 17:14:00 UTC

[jira] [Issue Comment Deleted] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default

     [ https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Knupp updated IMPALA-9489:
--------------------------------
    Comment: was deleted

(was: Review available at https://gerrit.cloudera.org/c/15417/)

> Setup impala-shell.sh env separately, and use thrift-0.11.0 by default
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9489
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9489
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>    Affects Versions: Impala 3.4.0
>            Reporter: David Knupp
>            Assignee: David Knupp
>            Priority: Major
>
> [Note: this JIRA was filed in relation to the ongoing effort to make the impala-shell compatible with python 3]
> The impala python development environment is a fairly convoluted affair -- a number of packages are installed in the infra/python/env, some of it comes from the toolchain, some of it is generated and lives in the shell directory. Generally speaking, if you launch impala-python and import a module, it's not necessarily easy to predict where the module might live.
> {noformat}
> $ python
> Python 2.7.10 (default, Aug 17 2018, 19:45:58)
> [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sasl
> >>> sasl
> <module 'sasl' from '/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
> >>> import requests
> >>> requests
> <module 'requests' from '/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
> >>> import Logging
> >>> Logging
> <module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
> >>> import thrift
> >>> thrift
> <module 'thrift' from '/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
> {noformat}
> Really, there is no one coherent environment -- there's just whatever collection of modules happens to be available at a given time for a given type of invocation, all of which is accomplished behind the scenes by calling scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that are responsible for cobbling together a PYTHONPATH based on known locations and current env variables.
> As far as I can tell, there are three important contexts where python comes into play...
> * during the build process (used during data load, e.g., testdata/bin/load_nested.py)
> * when running the py.test bases e2e tests
> * whenever the impala-shell is invoked
> As noted by IMPALA-7825 (and also in a conversation I had with [~stakiar_impala_496e]), we're dependent on thrift 0.9.3 during the build process. This seems to come into play during the loading of test data (specifically, when calling testdata/bin/load_nested.py) mainly because at one point there was some well-intentioned but probably misguided attempt at code reuse from the test framework. The test code that gets re-used involves impyla and/or thrift-sasl, which currently still relies on thrift 0.9.3. So our test framework, and by extension the build, both inherit the same limitation.
> The impala-shell, on the other hand, luckily doesn't directly reuse any of the same test modules, and there really is no need to keep it pinned to 0.9.3. However, since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the same script that script sets up the environment during building or testing, thrift 0.9.3 just kind of leaks over by default.
> As it turns out, thrift 0.9.3 is also one of the many limitations restricting the impala-shell to python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we just have to use it. And the way to accomplish that  is by decoupling the impala-shell from relying either {{set-pythonpath.sh}} or {{impala-python-common.sh}}. 
> As a first pass, we can address the dev environment by just having {{impala-shell.sh}} itself do whatever is required to find python dependencies, and we can specify thrift-0.11.0 there. Also, thrift 0.11.0 should be used by both of the scripts used to create the tarballs that package the impala-shell for customer environments. Neither of these should adversely building Impala or running the py.test test framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org