You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2022/02/15 19:49:00 UTC
[jira] [Created] (IMPALA-11125) Revisit the minimal-s3a-aws-sdk jar
Joe McDonnell created IMPALA-11125:
--------------------------------------
Summary: Revisit the minimal-s3a-aws-sdk jar
Key: IMPALA-11125
URL: https://issues.apache.org/jira/browse/IMPALA-11125
Project: IMPALA
Issue Type: Improvement
Components: Infrastructure
Affects Versions: Impala 4.1.0
Reporter: Joe McDonnell
The impala-minimal-s3a-aws-sdk jar takes the com.amazonaws aws-java-sdk-bundle and filters out a bunch of unneeded items. With these changes, the jar goes from 183MB to 89MB.
When unpacking it, it looks like we still have some content that can be removed. There are some services that we don't use (which may not have been there when we first did this):
{noformat}
$ ls com/amazonaws/services | wc -l
116
$ ls com/amazonaws/services
accessanalyzer
acmpca
apigatewaymanagementapi
appconfig
appflow
applicationinsights
appregistry
augmentedairuntime
...{noformat}
Separately, the models directory takes up a lot of space:
{noformat}
$ du -ch models
807M models
807M total
$ ls models | wc -l
468
$ ls models
a4b-2017-11-09-intermediate.json
a4b-2017-11-09-model.json
...{noformat}
These are json files that compress well, but nonetheless, they take up space.
We should either revisit our exclusions and try to avoid packaging some of these models, or we should try to avoid using aws-java-sdk-bundle and instead pick out individual jars like aws-java-sdk-s3 and aws-java-sdk-dynamodb.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org