You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Mark Hissink Muller <XM...@kombit.dk> on 2019/11/28 09:46:16 UTC

Concern about tika-parsers' dependencies

Hi all,

I would like to voice my concern about the amount of dependencies of org.apache.tika:tika-parsers:jar:1.22.

I recently needed to detect a charset, and I found CharsetDetector from the tika-parsers jar. It seemed like a small and and easy solution to zoom in on an encoding problem.

After I started to have problems starting my application (Cannot run program "C:\opt\jdk-13\bin\java.exe" ... CreateProcess error=206), I discovered the dependencies below were added by org.apache.tika:tika-parsers:jar:1.22 (extract from mvn dependency:tree).

I do note that the classpath for my application is already quite busy, but what tika-parsers added seems a bit over the top.

Hope this helps.

Best, Mark


[INFO] +- org.apache.tika:tika-parsers:jar:1.22:compile
[INFO] |  +- org.apache.tika:tika-core:jar:1.22:compile
[INFO] |  +- org.glassfish.jaxb:jaxb-runtime:jar:2.3.2:compile
[INFO] |  |  +- org.glassfish.jaxb:txw2:jar:2.3.2:compile
[INFO] |  |  +- com.sun.istack:istack-commons-runtime:jar:3.0.8:compile
[INFO] |  |  +- org.jvnet.staxex:stax-ex:jar:1.8.1:compile
[INFO] |  |  \- com.sun.xml.fastinfoset:FastInfoset:jar:1.2.16:compile
[INFO] |  +- com.sun.activation:jakarta.activation:jar:1.2.1:compile
[INFO] |  +- xerces:xercesImpl:jar:2.12.0:compile
[INFO] |  |  \- xml-apis:xml-apis:jar:1.4.01:compile
[INFO] |  +- org.apache.commons:commons-lang3:jar:3.9:compile
[INFO] |  +- javax.annotation:javax.annotation-api:jar:1.3.2:compile
[INFO] |  +- org.gagravarr:vorbis-java-tika:jar:0.8:compile
[INFO] |  +- org.tallison:jmatio:jar:1.5:compile
[INFO] |  +- org.apache.james:apache-mime4j-core:jar:0.8.3:compile
[INFO] |  +- org.apache.james:apache-mime4j-dom:jar:0.8.3:compile
[INFO] |  +- org.apache.commons:commons-compress:jar:1.18:compile
[INFO] |  +- org.tukaani:xz:jar:1.8:compile
[INFO] |  +- com.epam:parso:jar:2.0.11:compile
[INFO] |  +- org.brotli:dec:jar:0.1.2:compile
[INFO] |  +- org.apache.pdfbox:pdfbox-tools:jar:2.0.16:compile
[INFO] |  +- org.apache.pdfbox:jempbox:jar:1.8.16:compile
[INFO] |  +- org.bouncycastle:bcmail-jdk15on:jar:1.62:compile
[INFO] |  |  \- org.bouncycastle:bcpkix-jdk15on:jar:1.62:compile
[INFO] |  +- org.bouncycastle:bcprov-jdk15on:jar:1.62:compile
[INFO] |  +- org.apache.poi:poi-scratchpad:jar:4.0.1:compile
[INFO] |  +- com.healthmarketscience.jackcess:jackcess:jar:3.0.1:compile
[INFO] |  +- com.healthmarketscience.jackcess:jackcess-encrypt:jar:3.0.0:compile
[INFO] |  +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile
[INFO] |  +- org.ow2.asm:asm:jar:7.2-beta:compile
[INFO] |  +- com.googlecode.mp4parser:isoparser:jar:1.1.22:compile
[INFO] |  +- com.drewnoakes:metadata-extractor:jar:2.11.0:compile
[INFO] |  |  \- com.adobe.xmp:xmpcore:jar:5.1.3:compile
[INFO] |  +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile
[INFO] |  +- com.rometools:rome:jar:1.12.1:compile
[INFO] |  |  \- com.rometools:rome-utils:jar:1.12.1:compile
[INFO] |  +- org.gagravarr:vorbis-java-core:jar:0.8:compile
[INFO] |  +- com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3:compile
[INFO] |  +- org.codelibs:jhighlight:jar:1.0.3:compile
[INFO] |  +- com.pff:java-libpst:jar:0.8.1:compile
[INFO] |  +- com.github.junrar:junrar:jar:4.0.0:compile
[INFO] |  +- org.apache.cxf:cxf-rt-rs-client:jar:3.3.2:compile
[INFO] |  |  +- org.apache.cxf:cxf-rt-transports-http:jar:3.3.2:compile
[INFO] |  |  +- org.apache.cxf:cxf-core:jar:3.3.2:compile
[INFO] |  |  |  +- com.fasterxml.woodstox:woodstox-core:jar:5.0.3:compile
[INFO] |  |  |  |  \- org.codehaus.woodstox:stax2-api:jar:3.1.4:compile
[INFO] |  |  |  +- org.apache.ws.xmlschema:xmlschema-core:jar:2.2.4:compile
[INFO] |  |  |  \- org.glassfish.jaxb:jaxb-xjc:jar:2.3.2:compile
[INFO] |  |  |     +- org.glassfish.jaxb:xsom:jar:2.3.2:compile
[INFO] |  |  |     +- org.glassfish.jaxb:codemodel:jar:2.3.2:compile
[INFO] |  |  |     +- com.sun.xml.bind.external:rngom:jar:2.3.2:compile
[INFO] |  |  |     +- com.sun.xml.dtd-parser:dtd-parser:jar:1.4.1:compile
[INFO] |  |  |     +- com.sun.istack:istack-commons-tools:jar:3.0.8:compile
[INFO] |  |  |     |  \- org.apache.ant:ant:jar:1.10.5:compile
[INFO] |  |  |     |     \- org.apache.ant:ant-launcher:jar:1.10.5:compile
[INFO] |  |  |     \- com.sun.xml.bind.external:relaxng-datatype:jar:2.3.2:compile
[INFO] |  |  +- org.apache.cxf:cxf-rt-frontend-jaxrs:jar:3.3.2:compile
[INFO] |  |  |  +- jakarta.ws.rs:jakarta.ws.rs-api:jar:2.1.6:compile
[INFO] |  |  |  \- org.apache.cxf:cxf-rt-security:jar:3.3.2:compile
[INFO] |  |  +- javax.xml.ws:jaxws-api:jar:2.3.1:compile
[INFO] |  |  |  \- javax.xml.soap:javax.xml.soap-api:jar:1.4.0:compile
[INFO] |  |  +- com.sun.activation:javax.activation:jar:1.2.0:compile
[INFO] |  |  +- org.apache.geronimo.specs:geronimo-ws-metadata_2.0_spec:jar:1.1.3:compile
[INFO] |  |  +- com.sun.xml.messaging.saaj:saaj-impl:jar:1.5.1:compile
[INFO] |  |  |  +- jakarta.xml.soap:jakarta.xml.soap-api:jar:1.4.1:compile
[INFO] |  |  |  \- org.jvnet.mimepull:mimepull:jar:1.9.12:compile
[INFO] |  |  +- org.jacorb:jacorb-omgapi:jar:3.7:compile
[INFO] |  |  +- org.apache.geronimo.specs:geronimo-jta_1.1_spec:jar:1.1.1:compile
[INFO] |  |  \- org.jboss.spec.javax.rmi:jboss-rmi-api_1.0_spec:jar:1.0.6.Final:compile
[INFO] |  +- org.apache.commons:commons-exec:jar:1.3:compile
[INFO] |  +- org.apache.opennlp:opennlp-tools:jar:1.9.1:compile
[INFO] |  +- commons-io:commons-io:jar:2.6:compile
[INFO] |  +- com.googlecode.json-simple:json-simple:jar:1.1.1:compile
[INFO] |  +- com.github.openjson:openjson:jar:1.0.11:compile
[INFO] |  +- com.google.code.gson:gson:jar:2.8.6:compile
[INFO] |  +- org.slf4j:slf4j-api:jar:1.7.28:compile
[INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.28:compile
[INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.28:compile
[INFO] |  +- edu.ucar:netcdf4:jar:4.5.5:compile
[INFO] |  |  \- net.jcip:jcip-annotations:jar:1.0:compile
[INFO] |  +- org.jdom:jdom2:jar:2.0.6:compile
[INFO] |  +- edu.ucar:grib:jar:4.5.5:compile
[INFO] |  |  \- org.itadaki:bzip2:jar:0.9.1:compile
[INFO] |  +- net.java.dev.jna:jna:jar:4.5.2:compile
[INFO] |  +- org.jsoup:jsoup:jar:1.12.1:compile
[INFO] |  +- com.google.protobuf:protobuf-java:jar:3.9.0:compile
[INFO] |  +- edu.ucar:cdm:jar:4.5.5:compile
[INFO] |  |  +- edu.ucar:udunits:jar:4.5.5:compile
[INFO] |  |  +- joda-time:joda-time:jar:2.10.4:compile
[INFO] |  |  +- org.quartz-scheduler:quartz:jar:2.3.1:compile
[INFO] |  |  +- net.sf.ehcache:ehcache-core:jar:2.6.2:compile
[INFO] |  |  \- com.beust:jcommander:jar:1.35:compile
[INFO] |  +- com.mchange:c3p0:jar:0.9.5.4:compile
[INFO] |  |  \- com.mchange:mchange-commons-java:jar:0.2.15:compile
[INFO] |  +- edu.ucar:httpservices:jar:4.5.5:compile
[INFO] |  +- org.apache.httpcomponents:httpmime:jar:4.5.10:compile
[INFO] |  +- org.apache.commons:commons-csv:jar:1.7:compile
[INFO] |  +- org.apache.sis.core:sis-utility:jar:0.8:compile
[INFO] |  |  \- javax.measure:unit-api:jar:1.0:compile
[INFO] |  +- org.apache.sis.storage:sis-netcdf:jar:0.8:compile
[INFO] |  |  +- org.apache.sis.storage:sis-storage:jar:0.8:compile
[INFO] |  |  |  \- org.apache.sis.core:sis-feature:jar:0.8:compile
[INFO] |  |  \- org.apache.sis.core:sis-referencing:jar:0.8:compile
[INFO] |  +- org.apache.sis.core:sis-metadata:jar:0.8:compile
[INFO] |  +- org.opengis:geoapi:jar:3.0.1:compile
[INFO] |  +- edu.usc.ir:sentiment-analysis-parser:jar:0.1:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-core:jar:2.10.0:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.0:compile
[INFO] |  +- org.apache.pdfbox:jbig2-imageio:jar:3.0.2:compile
[INFO] |  \- com.github.jai-imageio:jai-imageio-core:jar:1.4.0:compile

Re: Concern about tika-parsers' dependencies

Posted by Sergey Beryozkin <sb...@gmail.com>.
Hi

We have an issue assigned to me, I hope to complete Bob's modularization
effort asap

Sergey

On Thu, Nov 28, 2019 at 9:46 AM Mark Hissink Muller <XM...@kombit.dk> wrote:

> Hi all,
>
> I would like to voice my concern about the amount of dependencies of
> org.apache.tika:tika-parsers:jar:1.22.
>
> I recently needed to detect a charset, and I found CharsetDetector from
> the tika-parsers jar. It seemed like a small and and easy solution to zoom
> in on an encoding problem.
>
> After I started to have problems starting my application (Cannot run
> program "C:\opt\jdk-13\bin\java.exe" ... CreateProcess error=206), I
> discovered the dependencies below were added by
> org.apache.tika:tika-parsers:jar:1.22 (extract from mvn dependency:tree).
>
> I do note that the classpath for my application is already quite busy, but
> what tika-parsers added seems a bit over the top.
>
> Hope this helps.
>
> Best, Mark
>
>
> [INFO] +- org.apache.tika:tika-parsers:jar:1.22:compile
> [INFO] |  +- org.apache.tika:tika-core:jar:1.22:compile
> [INFO] |  +- org.glassfish.jaxb:jaxb-runtime:jar:2.3.2:compile
> [INFO] |  |  +- org.glassfish.jaxb:txw2:jar:2.3.2:compile
> [INFO] |  |  +- com.sun.istack:istack-commons-runtime:jar:3.0.8:compile
> [INFO] |  |  +- org.jvnet.staxex:stax-ex:jar:1.8.1:compile
> [INFO] |  |  \- com.sun.xml.fastinfoset:FastInfoset:jar:1.2.16:compile
> [INFO] |  +- com.sun.activation:jakarta.activation:jar:1.2.1:compile
> [INFO] |  +- xerces:xercesImpl:jar:2.12.0:compile
> [INFO] |  |  \- xml-apis:xml-apis:jar:1.4.01:compile
> [INFO] |  +- org.apache.commons:commons-lang3:jar:3.9:compile
> [INFO] |  +- javax.annotation:javax.annotation-api:jar:1.3.2:compile
> [INFO] |  +- org.gagravarr:vorbis-java-tika:jar:0.8:compile
> [INFO] |  +- org.tallison:jmatio:jar:1.5:compile
> [INFO] |  +- org.apache.james:apache-mime4j-core:jar:0.8.3:compile
> [INFO] |  +- org.apache.james:apache-mime4j-dom:jar:0.8.3:compile
> [INFO] |  +- org.apache.commons:commons-compress:jar:1.18:compile
> [INFO] |  +- org.tukaani:xz:jar:1.8:compile
> [INFO] |  +- com.epam:parso:jar:2.0.11:compile
> [INFO] |  +- org.brotli:dec:jar:0.1.2:compile
> [INFO] |  +- org.apache.pdfbox:pdfbox-tools:jar:2.0.16:compile
> [INFO] |  +- org.apache.pdfbox:jempbox:jar:1.8.16:compile
> [INFO] |  +- org.bouncycastle:bcmail-jdk15on:jar:1.62:compile
> [INFO] |  |  \- org.bouncycastle:bcpkix-jdk15on:jar:1.62:compile
> [INFO] |  +- org.bouncycastle:bcprov-jdk15on:jar:1.62:compile
> [INFO] |  +- org.apache.poi:poi-scratchpad:jar:4.0.1:compile
> [INFO] |  +- com.healthmarketscience.jackcess:jackcess:jar:3.0.1:compile
> [INFO] |  +-
> com.healthmarketscience.jackcess:jackcess-encrypt:jar:3.0.0:compile
> [INFO] |  +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile
> [INFO] |  +- org.ow2.asm:asm:jar:7.2-beta:compile
> [INFO] |  +- com.googlecode.mp4parser:isoparser:jar:1.1.22:compile
> [INFO] |  +- com.drewnoakes:metadata-extractor:jar:2.11.0:compile
> [INFO] |  |  \- com.adobe.xmp:xmpcore:jar:5.1.3:compile
> [INFO] |  +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile
> [INFO] |  +- com.rometools:rome:jar:1.12.1:compile
> [INFO] |  |  \- com.rometools:rome-utils:jar:1.12.1:compile
> [INFO] |  +- org.gagravarr:vorbis-java-core:jar:0.8:compile
> [INFO] |  +-
> com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3:compile
> [INFO] |  +- org.codelibs:jhighlight:jar:1.0.3:compile
> [INFO] |  +- com.pff:java-libpst:jar:0.8.1:compile
> [INFO] |  +- com.github.junrar:junrar:jar:4.0.0:compile
> [INFO] |  +- org.apache.cxf:cxf-rt-rs-client:jar:3.3.2:compile
> [INFO] |  |  +- org.apache.cxf:cxf-rt-transports-http:jar:3.3.2:compile
> [INFO] |  |  +- org.apache.cxf:cxf-core:jar:3.3.2:compile
> [INFO] |  |  |  +- com.fasterxml.woodstox:woodstox-core:jar:5.0.3:compile
> [INFO] |  |  |  |  \- org.codehaus.woodstox:stax2-api:jar:3.1.4:compile
> [INFO] |  |  |  +- org.apache.ws.xmlschema:xmlschema-core:jar:2.2.4:compile
> [INFO] |  |  |  \- org.glassfish.jaxb:jaxb-xjc:jar:2.3.2:compile
> [INFO] |  |  |     +- org.glassfish.jaxb:xsom:jar:2.3.2:compile
> [INFO] |  |  |     +- org.glassfish.jaxb:codemodel:jar:2.3.2:compile
> [INFO] |  |  |     +- com.sun.xml.bind.external:rngom:jar:2.3.2:compile
> [INFO] |  |  |     +- com.sun.xml.dtd-parser:dtd-parser:jar:1.4.1:compile
> [INFO] |  |  |     +- com.sun.istack:istack-commons-tools:jar:3.0.8:compile
> [INFO] |  |  |     |  \- org.apache.ant:ant:jar:1.10.5:compile
> [INFO] |  |  |     |     \- org.apache.ant:ant-launcher:jar:1.10.5:compile
> [INFO] |  |  |     \-
> com.sun.xml.bind.external:relaxng-datatype:jar:2.3.2:compile
> [INFO] |  |  +- org.apache.cxf:cxf-rt-frontend-jaxrs:jar:3.3.2:compile
> [INFO] |  |  |  +- jakarta.ws.rs:jakarta.ws.rs-api:jar:2.1.6:compile
> [INFO] |  |  |  \- org.apache.cxf:cxf-rt-security:jar:3.3.2:compile
> [INFO] |  |  +- javax.xml.ws:jaxws-api:jar:2.3.1:compile
> [INFO] |  |  |  \- javax.xml.soap:javax.xml.soap-api:jar:1.4.0:compile
> [INFO] |  |  +- com.sun.activation:javax.activation:jar:1.2.0:compile
> [INFO] |  |  +-
> org.apache.geronimo.specs:geronimo-ws-metadata_2.0_spec:jar:1.1.3:compile
> [INFO] |  |  +- com.sun.xml.messaging.saaj:saaj-impl:jar:1.5.1:compile
> [INFO] |  |  |  +- jakarta.xml.soap:jakarta.xml.soap-api:jar:1.4.1:compile
> [INFO] |  |  |  \- org.jvnet.mimepull:mimepull:jar:1.9.12:compile
> [INFO] |  |  +- org.jacorb:jacorb-omgapi:jar:3.7:compile
> [INFO] |  |  +-
> org.apache.geronimo.specs:geronimo-jta_1.1_spec:jar:1.1.1:compile
> [INFO] |  |  \-
> org.jboss.spec.javax.rmi:jboss-rmi-api_1.0_spec:jar:1.0.6.Final:compile
> [INFO] |  +- org.apache.commons:commons-exec:jar:1.3:compile
> [INFO] |  +- org.apache.opennlp:opennlp-tools:jar:1.9.1:compile
> [INFO] |  +- commons-io:commons-io:jar:2.6:compile
> [INFO] |  +- com.googlecode.json-simple:json-simple:jar:1.1.1:compile
> [INFO] |  +- com.github.openjson:openjson:jar:1.0.11:compile
> [INFO] |  +- com.google.code.gson:gson:jar:2.8.6:compile
> [INFO] |  +- org.slf4j:slf4j-api:jar:1.7.28:compile
> [INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.28:compile
> [INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.28:compile
> [INFO] |  +- edu.ucar:netcdf4:jar:4.5.5:compile
> [INFO] |  |  \- net.jcip:jcip-annotations:jar:1.0:compile
> [INFO] |  +- org.jdom:jdom2:jar:2.0.6:compile
> [INFO] |  +- edu.ucar:grib:jar:4.5.5:compile
> [INFO] |  |  \- org.itadaki:bzip2:jar:0.9.1:compile
> [INFO] |  +- net.java.dev.jna:jna:jar:4.5.2:compile
> [INFO] |  +- org.jsoup:jsoup:jar:1.12.1:compile
> [INFO] |  +- com.google.protobuf:protobuf-java:jar:3.9.0:compile
> [INFO] |  +- edu.ucar:cdm:jar:4.5.5:compile
> [INFO] |  |  +- edu.ucar:udunits:jar:4.5.5:compile
> [INFO] |  |  +- joda-time:joda-time:jar:2.10.4:compile
> [INFO] |  |  +- org.quartz-scheduler:quartz:jar:2.3.1:compile
> [INFO] |  |  +- net.sf.ehcache:ehcache-core:jar:2.6.2:compile
> [INFO] |  |  \- com.beust:jcommander:jar:1.35:compile
> [INFO] |  +- com.mchange:c3p0:jar:0.9.5.4:compile
> [INFO] |  |  \- com.mchange:mchange-commons-java:jar:0.2.15:compile
> [INFO] |  +- edu.ucar:httpservices:jar:4.5.5:compile
> [INFO] |  +- org.apache.httpcomponents:httpmime:jar:4.5.10:compile
> [INFO] |  +- org.apache.commons:commons-csv:jar:1.7:compile
> [INFO] |  +- org.apache.sis.core:sis-utility:jar:0.8:compile
> [INFO] |  |  \- javax.measure:unit-api:jar:1.0:compile
> [INFO] |  +- org.apache.sis.storage:sis-netcdf:jar:0.8:compile
> [INFO] |  |  +- org.apache.sis.storage:sis-storage:jar:0.8:compile
> [INFO] |  |  |  \- org.apache.sis.core:sis-feature:jar:0.8:compile
> [INFO] |  |  \- org.apache.sis.core:sis-referencing:jar:0.8:compile
> [INFO] |  +- org.apache.sis.core:sis-metadata:jar:0.8:compile
> [INFO] |  +- org.opengis:geoapi:jar:3.0.1:compile
> [INFO] |  +- edu.usc.ir:sentiment-analysis-parser:jar:0.1:compile
> [INFO] |  +- com.fasterxml.jackson.core:jackson-core:jar:2.10.0:compile
> [INFO] |  +-
> com.fasterxml.jackson.core:jackson-annotations:jar:2.10.0:compile
> [INFO] |  +- org.apache.pdfbox:jbig2-imageio:jar:3.0.2:compile
> [INFO] |  \- com.github.jai-imageio:jai-imageio-core:jar:1.4.0:compile
>