You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2024/04/28 07:17:00 UTC

[jira] [Assigned] (IMPALA-10848) Provide compile-only option to skip downloading test dependencies

     [ https://issues.apache.org/jira/browse/IMPALA-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang reassigned IMPALA-10848:
---------------------------------------

    Assignee: Quanlong Huang  (was: XiangYang)

OK, assigning this to myself.

> Provide compile-only option to skip downloading test dependencies
> -----------------------------------------------------------------
>
>                 Key: IMPALA-10848
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10848
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>         Attachments: pywebhdfs_failure.png
>
>
> Compiling Impala is not easy for a beginner. A portion of failures are in downloading/installing dependencies.
> For instance, old versions of Impala may fail to compile since cdh components of old GBNs on S3 are removed. However, the artifacts of cdh component are only used in testing (minicluster & holding testdata). We can still compile without them.
> Take pip dependencies as another example, here is a failure I got from a community user. It failed by installing pywebhdfs:
> !pywebhdfs_failure.png!
> However, simple git-grep shows that pywebhdfs is only used in tests:
> {code:bash}
> $ git grep pywebhdfs
> bin/bootstrap_system.sh:#  >>> from pywebhdfs.webhdfs import PyWebHdfsClient
> infra/python/deps/requirements.txt:pywebhdfs == 0.3.2
> tests/common/impala_test_suite.py:    #     HDFS: uses a mixture of pywebhdfs (which is faster than the HDFS CLI) and the
> tests/util/hdfs_util.py:from pywebhdfs.webhdfs import PyWebHdfsClient, errors, _raise_pywebhdfs_exception
> tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, response.text)
> tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, response.text)
> tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, response.text)
> tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, response.text) {code}
> If the user just wants to compile Impala and deploys it in their existing Hadoop cluster, dealing with these failures is a waste of their time.
> *Target for this JIRA*
>  * Provide compile-only option to bin/bootstrap_system.sh. It should skip downloading/installing unused dependencies like postgresql.
>  * Provide compile-only option to buildall.sh. It should skip downloading unused cdh/cdp components in compilation.
>  * Update our [wiki|https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala] about this.
> Note that we already have some env vars to control the download behaviors, e.g. SKIP_PYTHON_DOWNLOAD, SKIP_TOOLCHAIN_BOOTSTRAP. We just need to make the compile-only scenario works with minimal requirements and document it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org