You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/07/23 21:32:04 UTC

[jira] [Commented] (NUTCH-2048) parse-tika: fix dependencies in plugin.xml

    [ https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639393#comment-14639393 ] 

Sebastian Nagel commented on NUTCH-2048:
----------------------------------------

Hi, it's not as trivial. There are more duplicates:
{noformat}
% perl -lne 'push @{$h{$2}}, $1 if /<library name="((.+?)-[.0-9]*\.jar)"/;
                      END { for (keys %h) { print $_, ": ", join(", ", @{$h{$_}}) if @{$h{$_}} > 1 }}' src/plugin/parse-tika/plugin.xml 
jhighlight: jhighlight-1.0.2.jar, jhighlight-1.0.jar
commons-compress: commons-compress-1.8.1.jar, commons-compress-1.9.jar
metadata-extractor: metadata-extractor-2.6.2.jar, metadata-extractor-2.8.0.jar
commons-codec: commons-codec-1.6.jar, commons-codec-1.9.jar
slf4j-api: slf4j-api-1.6.1.jar, slf4j-api-1.7.12.jar
fontbox: fontbox-1.8.8.jar, fontbox-1.8.9.jar
jempbox: jempbox-1.8.8.jar, jempbox-1.8.9.jar
pdfbox: pdfbox-1.8.8.jar, pdfbox-1.8.9.jar
tika-parsers: tika-parsers-1.7.jar, tika-parsers-1.8.jar
{noformat}

What shall be exactly listed in the plugin.xml? All libs placed by ant/ivy in runtime/local/plugins/parse-tika? That's currently 66! If yes, there is even more to do. That's the difference between plugin.xml (left), jars onlyin the plugin folder (middle) and common jars (right):
{noformat}
% ls runtime/local/plugins/parse-tika/ | grep -v '^parse-tika\.jar$' | grep -v plugin.xml | sort >/tmp/tika_jars.txt
% perl -lne 'print $1 if /<library name="((.+?)-[.0-9]*\.jar)"/' src/plugin/parse-tika/plugin.xml | sort | comm - /tmp/tika_jars.txt
                                apache-mime4j-core-0.7.2.jar
                                apache-mime4j-dom-0.7.2.jar
                                asm-debug-all-4.1.jar
                                aspectjrt-1.8.0.jar
bcmail-jdk15-1.45.jar
                                bcmail-jdk15on-1.52.jar
                                bcpkix-jdk15on-1.52.jar
bcprov-jdk15-1.45.jar
                                bcprov-jdk15on-1.52.jar
                                boilerpipe-1.1.0.jar
                                bzip2-0.9.1.jar
                                c3p0-0.9.1.1.jar
                                cdm-4.5.5.jar
commons-codec-1.6.jar
                                commons-codec-1.9.jar
commons-compress-1.8.1.jar
                                commons-compress-1.9.jar
                                commons-csv-1.0.jar
commons-httpclient-3.1.jar
                                commons-logging-1.1.1.jar
                                commons-logging-api-1.1.jar
                                commons-vfs2-2.0.jar
                                ehcache-core-2.6.2.jar
fontbox-1.8.8.jar
                                fontbox-1.8.9.jar
                                grib-4.5.5.jar
                                guava-10.0.1.jar
                                httpclient-4.2.6.jar
                                httpcore-4.2.5.jar
                                httpmime-4.2.6.jar
                                httpservices-4.5.5.jar
                                isoparser-1.0.2.jar
                                java-libpst-0.8.1.jar
                                jcip-annotations-1.0.jar
                                jcommander-1.35.jar
                                jdom-1.0.jar
                                jdom2-2.0.4.jar
jempbox-1.8.8.jar
                                jempbox-1.8.9.jar
                                jhighlight-1.0.2.jar
jhighlight-1.0.jar
                                jj2000-5.2.jar
                                jmatio-1.0.jar
                                jna-4.1.0.jar
                                joda-time-2.2.jar
                                jsoup-1.7.2.jar
                                jsr305-1.3.9.jar
                                juniversalchardet-1.0.3.jar
                                junrar-0.7.jar
                                maven-scm-api-1.4.jar
                                maven-scm-provider-svn-commons-1.4.jar
                                maven-scm-provider-svnexe-1.4.jar
metadata-extractor-2.6.2.jar
                                metadata-extractor-2.8.0.jar
netcdf-4.2.20.jar
                                netcdf4-4.5.5.jar
pdfbox-1.8.8.jar
                                pdfbox-1.8.9.jar
                                plexus-utils-1.5.6.jar
poi-3.11.jar
                poi-3.12-beta1.jar
poi-ooxml-3.11.jar
                poi-ooxml-3.12-beta1.jar
poi-ooxml-schemas-3.11.jar
                poi-ooxml-schemas-3.12-beta1.jar
poi-scratchpad-3.11.jar
                poi-scratchpad-3.12-beta1.jar
                                protobuf-java-2.5.0.jar
                                quartz-2.2.0.jar
                                regexp-1.3.jar
                                rome-0.9.jar
slf4j-api-1.6.1.jar
                                slf4j-api-1.7.12.jar
                                sqlite-jdbc-3.8.6.jar
                                tagsoup-1.2.1.jar
tika-parsers-1.7.jar
                                tika-parsers-1.8.jar
                                udunits-4.5.5.jar
unidataCommon-4.2.20.jar
                                vorbis-java-core-0.6.jar
                                vorbis-java-tika-0.6.jar
xercesImpl-2.8.1.jar
xml-apis-1.3.03.jar
                                xmlbeans-2.6.0.jar
                                xmpcore-5.1.2.jar
                                xz-1.5.jar
{noformat}

> parse-tika: fix dependencies in plugin.xml
> ------------------------------------------
>
>                 Key: NUTCH-2048
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2048
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.10
>            Reporter: Sebastian Nagel
>            Priority: Trivial
>             Fix For: 1.11
>
>         Attachments: NUTCH-2048_Joyce_20150723.patch
>
>
> Duplicate library dependencies listed in parse-tika's plugin.xml should be cleaned up. There are a duplicates, only the version differs, e.g.:
> {noformat}
> tika-parsers-1.7.jar
> tika-parsers-1.8.jar
> {noformat}
> Not critical because libs which are not present should be just ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)