You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Manfred Schenk (JIRA)" <ji...@apache.org> on 2016/08/05 09:38:20 UTC

[jira] [Comment Edited] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

    [ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409222#comment-15409222 ] 

Manfred Schenk edited comment on TIKA-1367 at 8/5/16 9:38 AM:
--------------------------------------------------------------

Definitely, a lot more detailed information about the dependencies is needed for tika-parsers.

This is the current list of dependencies which is pulled by gradle when using tika-parsers:
{noformat}
+--- org.apache.tika:tika-parsers:1.13
|    +--- org.apache.tika:tika-core:1.13
|    +--- org.gagravarr:vorbis-java-tika:0.8
|    |    \--- org.apache.tika:tika-core:1.12 -> 1.13
|    +--- com.healthmarketscience.jackcess:jackcess:2.1.3
|    |    +--- commons-lang:commons-lang:2.6
|    |    \--- commons-logging:commons-logging:1.1.3 -> 1.2
|    +--- com.healthmarketscience.jackcess:jackcess-encrypt:2.1.1
|    |    \--- com.healthmarketscience.jackcess:jackcess:2.1.0 -> 2.1.3 (*)
|    +--- net.sourceforge.jmatio:jmatio:1.0
|    +--- org.apache.james:apache-mime4j-core:0.7.2
|    +--- org.apache.james:apache-mime4j-dom:0.7.2
|    |    \--- org.apache.james:apache-mime4j-core:0.7.2
|    +--- org.apache.commons:commons-compress:1.11
|    +--- org.tukaani:xz:1.5
|    +--- commons-codec:commons-codec:1.10
|    +--- org.apache.pdfbox:pdfbox:2.0.1
|    |    +--- org.apache.pdfbox:fontbox:2.0.1
|    |    |    \--- commons-logging:commons-logging:1.2
|    |    \--- commons-logging:commons-logging:1.2
|    +--- org.apache.pdfbox:pdfbox-tools:2.0.1
|    |    \--- org.apache.pdfbox:pdfbox-debugger:2.0.1
|    |         \--- org.apache.pdfbox:pdfbox:2.0.1 (*)
|    +--- org.apache.pdfbox:jempbox:1.8.12
|    +--- org.bouncycastle:bcmail-jdk15on:1.54
|    |    +--- org.bouncycastle:bcprov-jdk15on:1.54
|    |    \--- org.bouncycastle:bcpkix-jdk15on:1.54
|    |         \--- org.bouncycastle:bcprov-jdk15on:1.54
|    +--- org.bouncycastle:bcprov-jdk15on:1.54
|    +--- org.apache.poi:poi:3.15-beta1
|    |    \--- commons-codec:commons-codec:1.10
|    +--- org.apache.poi:poi-scratchpad:3.15-beta1
|    |    \--- org.apache.poi:poi:3.15-beta1 (*)
|    +--- org.apache.poi:poi-ooxml:3.15-beta1
|    |    +--- org.apache.poi:poi:3.15-beta1 (*)
|    |    +--- org.apache.poi:poi-ooxml-schemas:3.15-beta1
|    |    |    \--- org.apache.xmlbeans:xmlbeans:2.6.0
|    |    \--- com.github.virtuald:curvesapi:1.03
|    +--- org.ccil.cowan.tagsoup:tagsoup:1.2.1
|    +--- org.ow2.asm:asm:5.0.4
|    +--- com.googlecode.mp4parser:isoparser:1.1.18
|    +--- com.drewnoakes:metadata-extractor:2.8.1 (*)
|    +--- de.l3s.boilerpipe:boilerpipe:1.1.0
|    +--- com.rometools:rome:1.5.1
|    |    +--- com.rometools:rome-utils:1.5.1
|    |    \--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    +--- org.gagravarr:vorbis-java-core:0.8
|    +--- com.googlecode.juniversalchardet:juniversalchardet:1.0.3
|    +--- org.codelibs:jhighlight:1.0.2
|    +--- com.pff:java-libpst:0.8.1
|    +--- com.github.junrar:junrar:0.7
|    |    +--- commons-logging:commons-logging-api:1.1
|    |    \--- org.apache.commons:commons-vfs2:2.0
|    |         +--- commons-logging:commons-logging:1.1.1 -> 1.2
|    |         +--- org.apache.maven.scm:maven-scm-api:1.4
|    |         |    \--- org.codehaus.plexus:plexus-utils:1.5.6
|    |         \--- org.apache.maven.scm:maven-scm-provider-svnexe:1.4
|    |              +--- org.apache.maven.scm:maven-scm-provider-svn-commons:1.4
|    |              |    +--- org.apache.maven.scm:maven-scm-api:1.4 (*)
|    |              |    \--- org.codehaus.plexus:plexus-utils:1.5.6
|    |              +--- regexp:regexp:1.3
|    |              +--- org.apache.maven.scm:maven-scm-api:1.4 (*)
|    |              \--- org.codehaus.plexus:plexus-utils:1.5.6
|    +--- org.apache.cxf:cxf-rt-rs-client:3.0.3
|    |    +--- org.apache.cxf:cxf-rt-transports-http:3.0.3
|    |    |    \--- org.apache.cxf:cxf-core:3.0.3
|    |    |         +--- org.codehaus.woodstox:woodstox-core-asl:4.4.1
|    |    |         |    \--- org.codehaus.woodstox:stax2-api:3.1.4
|    |    |         \--- org.apache.ws.xmlschema:xmlschema-core:2.1.0
|    |    +--- org.apache.cxf:cxf-core:3.0.3 (*)
|    |    \--- org.apache.cxf:cxf-rt-frontend-jaxrs:3.0.3
|    |         +--- org.apache.cxf:cxf-core:3.0.3 (*)
|    |         +--- javax.ws.rs:javax.ws.rs-api:2.0.1
|    |         +--- javax.annotation:javax.annotation-api:1.2
|    |         \--- org.apache.cxf:cxf-rt-transports-http:3.0.3 (*)
|    +--- org.apache.opennlp:opennlp-tools:1.5.3
|    |    +--- org.apache.opennlp:opennlp-maxent:3.0.3
|    |    \--- net.sf.jwordnet:jwnl:1.3.3
|    +--- commons-io:commons-io:2.4
|    +--- org.apache.commons:commons-exec:1.3
|    +--- com.googlecode.json-simple:json-simple:1.1.1
|    +--- org.json:json:20140107
|    +--- com.google.code.gson:gson:2.2.4 -> 2.4
|    +--- edu.ucar:netcdf4:4.5.5
|    |    +--- edu.ucar:cdm:4.5.5
|    |    |    +--- edu.ucar:udunits:4.5.5
|    |    |    |    +--- joda-time:joda-time:2.2
|    |    |    |    \--- net.jcip:jcip-annotations:1.0
|    |    |    +--- edu.ucar:httpservices:4.5.5
|    |    |    |    +--- net.jcip:jcip-annotations:1.0
|    |    |    |    +--- org.apache.httpcomponents:httpclient:4.2.6 (*)
|    |    |    |    +--- org.apache.httpcomponents:httpcore:4.2.5
|    |    |    |    +--- org.apache.httpcomponents:httpmime:4.2.6 (*)
|    |    |    |    \--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    |    |    +--- org.apache.httpcomponents:httpcore:4.2.5
|    |    |    +--- joda-time:joda-time:2.2
|    |    |    +--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    |    |    +--- org.jdom:jdom2:2.0.4
|    |    |    +--- net.jcip:jcip-annotations:1.0
|    |    |    +--- org.quartz-scheduler:quartz:2.2.0
|    |    |    |    +--- c3p0:c3p0:0.9.1.1
|    |    |    |    \--- org.slf4j:slf4j-api:1.6.6 -> 1.7.14
|    |    |    +--- com.google.protobuf:protobuf-java:2.5.0
|    |    |    +--- net.sf.ehcache:ehcache-core:2.6.2
|    |    |    |    \--- org.slf4j:slf4j-api:1.6.1 -> 1.7.14
|    |    |    +--- com.google.guava:guava:17.0 -> 19.0
|    |    |    +--- org.itadaki:bzip2:0.9.1
|    |    |    +--- com.beust:jcommander:1.35
|    |    |    \--- org.slf4j:jcl-over-slf4j:1.7.7 -> 1.7.14
|    |    |         \--- org.slf4j:slf4j-api:1.7.14
|    |    +--- net.jcip:jcip-annotations:1.0
|    |    +--- net.java.dev.jna:jna:4.1.0
|    |    \--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    +--- edu.ucar:grib:4.5.5
|    |    +--- edu.ucar:cdm:4.5.5 (*)
|    |    +--- com.google.protobuf:protobuf-java:2.5.0
|    |    +--- org.jdom:jdom2:2.0.4
|    |    +--- org.jsoup:jsoup:1.7.2
|    |    +--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    |    +--- net.jcip:jcip-annotations:1.0
|    |    +--- edu.ucar:jj2000:5.2
|    |    \--- org.itadaki:bzip2:0.9.1
|    +--- edu.ucar:cdm:4.5.5 (*)
|    +--- edu.ucar:httpservices:4.5.5 (*)
|    +--- org.apache.commons:commons-csv:1.0
|    +--- org.apache.sis.core:sis-utility:0.6
|    |    \--- org.opengis:geoapi:3.0.0
|    |         \--- javax.measure:jsr-275:0.9.3
|    +--- org.apache.sis.storage:sis-netcdf:0.6
|    |    +--- org.apache.sis.storage:sis-storage:0.6
|    |    |    +--- org.apache.sis.core:sis-metadata:0.6
|    |    |    |    +--- org.apache.sis.core:sis-utility:0.6 (*)
|    |    |    |    \--- org.opengis:geoapi:3.0.0 (*)
|    |    |    +--- org.apache.sis.core:sis-referencing:0.6
|    |    |    |    +--- org.apache.sis.core:sis-utility:0.6 (*)
|    |    |    |    +--- org.apache.sis.core:sis-metadata:0.6 (*)
|    |    |    |    \--- org.opengis:geoapi:3.0.0 (*)
|    |    |    \--- org.opengis:geoapi:3.0.0 (*)
|    |    +--- org.apache.sis.core:sis-metadata:0.6 (*)
|    |    +--- org.apache.sis.core:sis-referencing:0.6 (*)
|    |    \--- org.opengis:geoapi:3.0.0 (*)
|    +--- org.apache.sis.core:sis-metadata:0.6 (*)
|    +--- org.opengis:geoapi:3.0.0 (*)
|    \--- com.fasterxml.jackson.core:jackson-core:2.7.1

{noformat}

If I look on that list I'm asking myself if some dependencies are really needed at runtime, e.g.:  org.apache.maven.scm:maven-scm-provider-svn-commons.



was (Author: mschenk74iosb):
Definitely, a lot more detailed information about the dependencies is needed for tika-parsers.

This is the current list of dependencies which is pulled by gradle when using tika-parsers:

+--- org.apache.tika:tika-parsers:1.13
|    +--- org.apache.tika:tika-core:1.13
|    +--- org.gagravarr:vorbis-java-tika:0.8
|    |    \--- org.apache.tika:tika-core:1.12 -> 1.13
|    +--- com.healthmarketscience.jackcess:jackcess:2.1.3
|    |    +--- commons-lang:commons-lang:2.6
|    |    \--- commons-logging:commons-logging:1.1.3 -> 1.2
|    +--- com.healthmarketscience.jackcess:jackcess-encrypt:2.1.1
|    |    \--- com.healthmarketscience.jackcess:jackcess:2.1.0 -> 2.1.3 (*)
|    +--- net.sourceforge.jmatio:jmatio:1.0
|    +--- org.apache.james:apache-mime4j-core:0.7.2
|    +--- org.apache.james:apache-mime4j-dom:0.7.2
|    |    \--- org.apache.james:apache-mime4j-core:0.7.2
|    +--- org.apache.commons:commons-compress:1.11
|    +--- org.tukaani:xz:1.5
|    +--- commons-codec:commons-codec:1.10
|    +--- org.apache.pdfbox:pdfbox:2.0.1
|    |    +--- org.apache.pdfbox:fontbox:2.0.1
|    |    |    \--- commons-logging:commons-logging:1.2
|    |    \--- commons-logging:commons-logging:1.2
|    +--- org.apache.pdfbox:pdfbox-tools:2.0.1
|    |    \--- org.apache.pdfbox:pdfbox-debugger:2.0.1
|    |         \--- org.apache.pdfbox:pdfbox:2.0.1 (*)
|    +--- org.apache.pdfbox:jempbox:1.8.12
|    +--- org.bouncycastle:bcmail-jdk15on:1.54
|    |    +--- org.bouncycastle:bcprov-jdk15on:1.54
|    |    \--- org.bouncycastle:bcpkix-jdk15on:1.54
|    |         \--- org.bouncycastle:bcprov-jdk15on:1.54
|    +--- org.bouncycastle:bcprov-jdk15on:1.54
|    +--- org.apache.poi:poi:3.15-beta1
|    |    \--- commons-codec:commons-codec:1.10
|    +--- org.apache.poi:poi-scratchpad:3.15-beta1
|    |    \--- org.apache.poi:poi:3.15-beta1 (*)
|    +--- org.apache.poi:poi-ooxml:3.15-beta1
|    |    +--- org.apache.poi:poi:3.15-beta1 (*)
|    |    +--- org.apache.poi:poi-ooxml-schemas:3.15-beta1
|    |    |    \--- org.apache.xmlbeans:xmlbeans:2.6.0
|    |    \--- com.github.virtuald:curvesapi:1.03
|    +--- org.ccil.cowan.tagsoup:tagsoup:1.2.1
|    +--- org.ow2.asm:asm:5.0.4
|    +--- com.googlecode.mp4parser:isoparser:1.1.18
|    +--- com.drewnoakes:metadata-extractor:2.8.1 (*)
|    +--- de.l3s.boilerpipe:boilerpipe:1.1.0
|    +--- com.rometools:rome:1.5.1
|    |    +--- com.rometools:rome-utils:1.5.1
|    |    \--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    +--- org.gagravarr:vorbis-java-core:0.8
|    +--- com.googlecode.juniversalchardet:juniversalchardet:1.0.3
|    +--- org.codelibs:jhighlight:1.0.2
|    +--- com.pff:java-libpst:0.8.1
|    +--- com.github.junrar:junrar:0.7
|    |    +--- commons-logging:commons-logging-api:1.1
|    |    \--- org.apache.commons:commons-vfs2:2.0
|    |         +--- commons-logging:commons-logging:1.1.1 -> 1.2
|    |         +--- org.apache.maven.scm:maven-scm-api:1.4
|    |         |    \--- org.codehaus.plexus:plexus-utils:1.5.6
|    |         \--- org.apache.maven.scm:maven-scm-provider-svnexe:1.4
|    |              +--- org.apache.maven.scm:maven-scm-provider-svn-commons:1.4
|    |              |    +--- org.apache.maven.scm:maven-scm-api:1.4 (*)
|    |              |    \--- org.codehaus.plexus:plexus-utils:1.5.6
|    |              +--- regexp:regexp:1.3
|    |              +--- org.apache.maven.scm:maven-scm-api:1.4 (*)
|    |              \--- org.codehaus.plexus:plexus-utils:1.5.6
|    +--- org.apache.cxf:cxf-rt-rs-client:3.0.3
|    |    +--- org.apache.cxf:cxf-rt-transports-http:3.0.3
|    |    |    \--- org.apache.cxf:cxf-core:3.0.3
|    |    |         +--- org.codehaus.woodstox:woodstox-core-asl:4.4.1
|    |    |         |    \--- org.codehaus.woodstox:stax2-api:3.1.4
|    |    |         \--- org.apache.ws.xmlschema:xmlschema-core:2.1.0
|    |    +--- org.apache.cxf:cxf-core:3.0.3 (*)
|    |    \--- org.apache.cxf:cxf-rt-frontend-jaxrs:3.0.3
|    |         +--- org.apache.cxf:cxf-core:3.0.3 (*)
|    |         +--- javax.ws.rs:javax.ws.rs-api:2.0.1
|    |         +--- javax.annotation:javax.annotation-api:1.2
|    |         \--- org.apache.cxf:cxf-rt-transports-http:3.0.3 (*)
|    +--- org.apache.opennlp:opennlp-tools:1.5.3
|    |    +--- org.apache.opennlp:opennlp-maxent:3.0.3
|    |    \--- net.sf.jwordnet:jwnl:1.3.3
|    +--- commons-io:commons-io:2.4
|    +--- org.apache.commons:commons-exec:1.3
|    +--- com.googlecode.json-simple:json-simple:1.1.1
|    +--- org.json:json:20140107
|    +--- com.google.code.gson:gson:2.2.4 -> 2.4
|    +--- edu.ucar:netcdf4:4.5.5
|    |    +--- edu.ucar:cdm:4.5.5
|    |    |    +--- edu.ucar:udunits:4.5.5
|    |    |    |    +--- joda-time:joda-time:2.2
|    |    |    |    \--- net.jcip:jcip-annotations:1.0
|    |    |    +--- edu.ucar:httpservices:4.5.5
|    |    |    |    +--- net.jcip:jcip-annotations:1.0
|    |    |    |    +--- org.apache.httpcomponents:httpclient:4.2.6 (*)
|    |    |    |    +--- org.apache.httpcomponents:httpcore:4.2.5
|    |    |    |    +--- org.apache.httpcomponents:httpmime:4.2.6 (*)
|    |    |    |    \--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    |    |    +--- org.apache.httpcomponents:httpcore:4.2.5
|    |    |    +--- joda-time:joda-time:2.2
|    |    |    +--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    |    |    +--- org.jdom:jdom2:2.0.4
|    |    |    +--- net.jcip:jcip-annotations:1.0
|    |    |    +--- org.quartz-scheduler:quartz:2.2.0
|    |    |    |    +--- c3p0:c3p0:0.9.1.1
|    |    |    |    \--- org.slf4j:slf4j-api:1.6.6 -> 1.7.14
|    |    |    +--- com.google.protobuf:protobuf-java:2.5.0
|    |    |    +--- net.sf.ehcache:ehcache-core:2.6.2
|    |    |    |    \--- org.slf4j:slf4j-api:1.6.1 -> 1.7.14
|    |    |    +--- com.google.guava:guava:17.0 -> 19.0
|    |    |    +--- org.itadaki:bzip2:0.9.1
|    |    |    +--- com.beust:jcommander:1.35
|    |    |    \--- org.slf4j:jcl-over-slf4j:1.7.7 -> 1.7.14
|    |    |         \--- org.slf4j:slf4j-api:1.7.14
|    |    +--- net.jcip:jcip-annotations:1.0
|    |    +--- net.java.dev.jna:jna:4.1.0
|    |    \--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    +--- edu.ucar:grib:4.5.5
|    |    +--- edu.ucar:cdm:4.5.5 (*)
|    |    +--- com.google.protobuf:protobuf-java:2.5.0
|    |    +--- org.jdom:jdom2:2.0.4
|    |    +--- org.jsoup:jsoup:1.7.2
|    |    +--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|    |    +--- net.jcip:jcip-annotations:1.0
|    |    +--- edu.ucar:jj2000:5.2
|    |    \--- org.itadaki:bzip2:0.9.1
|    +--- edu.ucar:cdm:4.5.5 (*)
|    +--- edu.ucar:httpservices:4.5.5 (*)
|    +--- org.apache.commons:commons-csv:1.0
|    +--- org.apache.sis.core:sis-utility:0.6
|    |    \--- org.opengis:geoapi:3.0.0
|    |         \--- javax.measure:jsr-275:0.9.3
|    +--- org.apache.sis.storage:sis-netcdf:0.6
|    |    +--- org.apache.sis.storage:sis-storage:0.6
|    |    |    +--- org.apache.sis.core:sis-metadata:0.6
|    |    |    |    +--- org.apache.sis.core:sis-utility:0.6 (*)
|    |    |    |    \--- org.opengis:geoapi:3.0.0 (*)
|    |    |    +--- org.apache.sis.core:sis-referencing:0.6
|    |    |    |    +--- org.apache.sis.core:sis-utility:0.6 (*)
|    |    |    |    +--- org.apache.sis.core:sis-metadata:0.6 (*)
|    |    |    |    \--- org.opengis:geoapi:3.0.0 (*)
|    |    |    \--- org.opengis:geoapi:3.0.0 (*)
|    |    +--- org.apache.sis.core:sis-metadata:0.6 (*)
|    |    +--- org.apache.sis.core:sis-referencing:0.6 (*)
|    |    \--- org.opengis:geoapi:3.0.0 (*)
|    +--- org.apache.sis.core:sis-metadata:0.6 (*)
|    +--- org.opengis:geoapi:3.0.0 (*)
|    \--- com.fasterxml.jackson.core:jackson-core:2.7.1


If I look on that list I'm asking myself if some dependencies are really needed at runtime, e.g.:  org.apache.maven.scm:maven-scm-provider-svn-commons.


> Tika documentation should list tika-parsers parser dependencies
> ---------------------------------------------------------------
>
>                 Key: TIKA-1367
>                 URL: https://issues.apache.org/jira/browse/TIKA-1367
>             Project: Tika
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Sergey Beryozkin
>             Fix For: 1.14
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven users of tika-parsers have to exclude all the transitivie dependencies manually. Documenting the list of the existing transitive dependencies and keeping the list up to date will help developers exclude the libraries not needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)