You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/02/06 02:31:00 UTC

[jira] [Updated] (ANY23-447) Reduce Any23 dependency bloat

     [ https://issues.apache.org/jira/browse/ANY23-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated ANY23-447:
---------------------------------------
    Attachment: output.txt

> Reduce Any23 dependency bloat
> -----------------------------
>
>                 Key: ANY23-447
>                 URL: https://issues.apache.org/jira/browse/ANY23-447
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.3
>            Reporter: David Cockbill
>            Priority: Minor
>         Attachments: output.txt
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compelled by email conversation with Hans Brende:
> {code:java}
> David, unfortunately this move won't reduce the number of core dependencies
> we have: the plugins and service modules are not dependencies of the core
> module. However, it might be useful if you posted an issue about the
> dependency bloat, including the various exclusions you are using: we might
> be able to mitigate the problem.
> {code}
> This was a result of having to exclude dependencies in the pom.xml for a product (Note that there was not too much thought in the exclusions, I was trying to get the code size down before a release). Section of pom.xml:
> {code:java}
>     <dependency>
>       <groupId>org.apache.any23</groupId>
>       <artifactId>apache-any23-core</artifactId>
>         <exclusions>
>           <!-- Any23 brings in a lot of dependencies which bloats the sharded jar. 
>                This is an attempt to reduce this by excluding packages
>                that we may not be using as part of Any23.
>                NOTE: If dependency is required at runtime, then a 
>                java.lang.NoClassDefFoundError is thrown.  -->
>           
>           <exclusion>
>             <groupId>org.apache.tika</groupId>
>             <artifactId>tika-parsers</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.bouncycastle</groupId>
>             <artifactId>bcmail-jdk15on</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.bouncycastle</groupId>
>             <artifactId>bcprov-jdk15on</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>edu.ucar</groupId>
>             <artifactId>cdm</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>net.sf.trove4j</groupId>
>             <artifactId>trove4j</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.cxf</groupId>
>             <artifactId>cxf-rt-rs-client</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>com.github.ben-manes.caffeine</groupId>
>             <artifactId>caffeine</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.opengis</groupId>
>             <artifactId>geoapi</artifactId>
>           </exclusion>  
>           <exclusion>
>             <groupId>com.drewnoakes</groupId>
>             <artifactId>metadata-extractor</artifactId>
>           </exclusion> 
>           <exclusion>
>             <groupId>org.eclipse.rdf4j</groupId>
>             <artifactId>rdf4j-repository-sail</artifactId>
>           </exclusion> 
>           <exclusion>
>             <groupId>org.eclipse.rdf4j</groupId>
>             <artifactId>rdf4j-sail-memory</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.tukaani</groupId>
>             <artifactId>xz</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.codelibs</groupId>
>             <artifactId>jhighlight</artifactId>
>           </exclusion> 
>           <exclusion>
>             <groupId>org.gagravarr</groupId>
>             <artifactId>vorbis-java-core</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.gagravarr</groupId>
>             <artifactId>vorbis-java-tika</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.opennlp</groupId>
>             <artifactId>opennlp-tools</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.pdfbox</groupId>
>             <artifactId>pdfbox</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.pdfbox</groupId>
>             <artifactId>pdfbox-tools</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.poi</groupId>
>             <artifactId>poi-scratchpad</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>edu.ucar</groupId>
>             <artifactId>grib</artifactId>
>           </exclusion>  
>           <exclusion>
>             <groupId>com.googlecode.mp4parser</groupId>
>             <artifactId>isoparser</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>com.healthmarketscience.jackcess</groupId>
>             <artifactId>jackcess</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>com.healthmarketscience.jackcess</groupId>
>             <artifactId>jackcess-encrypt</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.sis.core</groupId>
>             <artifactId>sis-utility</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.sis.storage</groupId>
>             <artifactId>sis-netcdf</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.sis.core</groupId>
>             <artifactId>sis-metadata</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.eclipse.rdf4j</groupId>
>             <artifactId>rdf4j-rio-trix</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.yaml</groupId>
>             <artifactId>snakeyaml</artifactId>
>           </exclusion>        
>           <exclusion>
>             <groupId>org.eclipse.rdf4j</groupId>
>             <artifactId>rdf4j-rio-turtle</artifactId>
>           </exclusion>         
>         </exclusions>
>     </dependency>
> {code}
> Some background that may be useful from my notes:
> {code:java}
> Whilst adding Any23 the product, the Any23 Core package was causing Lintian to fail.
> Lintian is a Debian package checker written in PERL. This package uses Archive::Zip to unpack any .jar file in the Debian package. This particular unzip utility does not handle the Zip64 format; causing the failure. The original zip format has various restrictions, one of which being the number of files in the archive. Therefore if the class files in the jar for the product exceeds this limit (65535), then a zip64 format file is produced instead of a standard zip file.
> The Any23 Core Library does seem quite excessive in what it pulls in. From running the following, the output for the product goes from 40490 to 78513.
> zipinfo -1 product.jar | wc -l
> {code}
> This Linitan failure on a linux build was the original push for the exclusions; however the product .jar also increased in a similar fashion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)