You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2020/12/28 03:09:00 UTC

[jira] [Created] (IMPALA-10409) Reduce total size of artifacts downloaded from S3 in building

Quanlong Huang created IMPALA-10409:
---------------------------------------

             Summary: Reduce total size of artifacts downloaded from S3 in building
                 Key: IMPALA-10409
                 URL: https://issues.apache.org/jira/browse/IMPALA-10409
             Project: IMPALA
          Issue Type: Improvement
          Components: Infrastructure
            Reporter: Quanlong Huang


When building Impala, we need to download lots of dependencies.

[~joemcdonnell] helps to scrutinize where all the jars are coming from:
{code:java}
Number of artifacts downloaded from each repo:
     16 cdh.rcs.releases.repo
   2067 central
    203 impala.cdp.repo
      2 impala.toolchain.kudu.repo {code}
In my local env, the majority of the build time is spent in downloading artifacts from Cloudera's S3 bucket. There are some large files, e.g.
{code:java}
458.2 MiB llvm-5.0.1-asserts-p3-gcc-7.5.0-ec2-package-ubuntu-16-04.tar.gz
373.4 MiB llvm-5.0.1-p3-gcc-7.5.0-ec2-package-ubuntu-16-04.tar.gz
1.1 GiB kudu-6a7cadc7e-gcc-7.5.0-ec2-package-ubuntu-16-04.tar.gz
333.0 MiB apache-hive-3.1.3000.7.2.7.0-44-bin.tar.gz
377.2 MiB hadoop-3.1.1.7.2.7.0-44.tar.gz
370.4 MiB hbase-2.2.6.7.2.7.0-44-bin.tar.gz
258.3 MiB ranger-2.1.0.7.2.7.0-44-admin.tar.gz
63.4 MiB tez-0.9.1.7.2.7.0-44-minimal.tar.gz
{code}
Downloading from S3 is super slow in China and maybe other places around the world. One solution is refactoring our dependencies to be on Apache released versions (IMPALA-10408) so we can download them from Apache mirrors.

Another solution is providing alternative download sources like Alibaba Cloud or qcloud (Tencent Cloud). Developers can choose or setup their own sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)