You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2022/02/15 19:49:00 UTC

[jira] [Created] (IMPALA-11125) Revisit the minimal-s3a-aws-sdk jar

Joe McDonnell created IMPALA-11125:
--------------------------------------

             Summary: Revisit the minimal-s3a-aws-sdk jar
                 Key: IMPALA-11125
                 URL: https://issues.apache.org/jira/browse/IMPALA-11125
             Project: IMPALA
          Issue Type: Improvement
          Components: Infrastructure
    Affects Versions: Impala 4.1.0
            Reporter: Joe McDonnell


The impala-minimal-s3a-aws-sdk jar takes the com.amazonaws aws-java-sdk-bundle and filters out a bunch of unneeded items. With these changes, the jar goes from 183MB to 89MB.

When unpacking it, it looks like we still have some content that can be removed. There are some services that we don't use (which may not have been there when we first did this):
{noformat}
$ ls com/amazonaws/services | wc -l
116
$ ls com/amazonaws/services
accessanalyzer
acmpca
apigatewaymanagementapi
appconfig
appflow
applicationinsights
appregistry
augmentedairuntime
...{noformat}
Separately, the models directory takes up a lot of space:
{noformat}
$ du -ch models
807M    models
807M    total
$ ls models | wc -l
468
$ ls models
a4b-2017-11-09-intermediate.json
a4b-2017-11-09-model.json
...{noformat}
These are json files that compress well, but nonetheless, they take up space.

We should either revisit our exclusions and try to avoid packaging some of these models, or we should try to avoid using aws-java-sdk-bundle and instead pick out individual jars like aws-java-sdk-s3 and aws-java-sdk-dynamodb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org