You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "David Knupp (JIRA)" <ji...@apache.org> on 2018/03/19 19:11:00 UTC

[jira] [Created] (IMPALA-6702) Consider using standard pip client to download python dependencies

David Knupp created IMPALA-6702:
-----------------------------------

             Summary: Consider using standard pip client to download python dependencies
                 Key: IMPALA-6702
                 URL: https://issues.apache.org/jira/browse/IMPALA-6702
             Project: IMPALA
          Issue Type: Improvement
          Components: Infrastructure
    Affects Versions: Impala 3.0, Impala 2.12.0
            Reporter: David Knupp


Impala currently uses a hand-rolled client to download python dependencies:
 [https://github.com/apache/impala/blob/master/infra/python/deps/pip_download.py]

This client skips the install step, adds automatic retries, and avoids trying to use wheel packages. However, the standard pip client does all of these things as well. Sometimes, upstream changes to PyPI can cause this custom client to break, most recently in IMPALA-6682 and IMPALA-6695. Perhaps Impala should consider dropping pip_download.py in favor of using the public pip client. (Kudu-python presents a problem though – see below.)

A quick test did show that there were some minor differences. Using pip_download.py vs. pip v9.0.2 to process the various requirements.txt files at [https://github.com/apache/impala/tree/master/infra/python/deps]. The pip command used was:
{noformat}
$ pip download --dest=$IMPALA_HOME/infra/python/deps --no-binary=:all: --no-deps -r $IMPALA_HOME/infra/python/deps/*requirements.txt
{noformat}
For requirements.txt:
 * pip v9.0.2
 ** Ignores readline: markers 'sys_platform == "darwin"' don't match your environment
 ** Downloads prettytable-0.7.2.zip
 ** Downloads pyparsing-2.0.3.zip
 * pip_download.py
 ** Downloads readline-6.2.4.1.tar.gz
 ** Downloads prettytable-0.7.2.tar.bz2
 ** Downloads pyparsing-2.0.3.tar.gz

For compiled-requirements.txt
 * pip v9.0.2
 ** Downloads Cython-0.23.4.zip
 ** Downloads numpy-1.10.4.zip
 * pip_download.py
 ** Downloads Cython-0.23.4.tar.gz
 ** Downloads numpy-1.10.4.tar.gz

For adls-requirements.txt
 * no difference

 

Unfortunately, the kudu-requirements.txt, which only contains one dependency ({{kudu-python==1.2.0}}), is problematic. Even when using the {{download}} command with pip, setup.py tries to install the package:

 
{noformat}
Using cached kudu-python-1.2.0.tar.gz
 Saved ./kudu-python-1.2.0.tar.gz
 Complete output from command python setup.py egg_info:
 Cannot find installed kudu client.
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-N1Mu7y/kudu-python/
{noformat}
We would need to either figure out why this happening (maybe it's a bug in the Kudu setup.py file), or else find a workaround for this one package.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)