You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/07/23 21:32:04 UTC
[jira] [Commented] (NUTCH-2048) parse-tika: fix dependencies in
plugin.xml
[ https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639393#comment-14639393 ]
Sebastian Nagel commented on NUTCH-2048:
----------------------------------------
Hi, it's not as trivial. There are more duplicates:
{noformat}
% perl -lne 'push @{$h{$2}}, $1 if /<library name="((.+?)-[.0-9]*\.jar)"/;
END { for (keys %h) { print $_, ": ", join(", ", @{$h{$_}}) if @{$h{$_}} > 1 }}' src/plugin/parse-tika/plugin.xml
jhighlight: jhighlight-1.0.2.jar, jhighlight-1.0.jar
commons-compress: commons-compress-1.8.1.jar, commons-compress-1.9.jar
metadata-extractor: metadata-extractor-2.6.2.jar, metadata-extractor-2.8.0.jar
commons-codec: commons-codec-1.6.jar, commons-codec-1.9.jar
slf4j-api: slf4j-api-1.6.1.jar, slf4j-api-1.7.12.jar
fontbox: fontbox-1.8.8.jar, fontbox-1.8.9.jar
jempbox: jempbox-1.8.8.jar, jempbox-1.8.9.jar
pdfbox: pdfbox-1.8.8.jar, pdfbox-1.8.9.jar
tika-parsers: tika-parsers-1.7.jar, tika-parsers-1.8.jar
{noformat}
What shall be exactly listed in the plugin.xml? All libs placed by ant/ivy in runtime/local/plugins/parse-tika? That's currently 66! If yes, there is even more to do. That's the difference between plugin.xml (left), jars onlyin the plugin folder (middle) and common jars (right):
{noformat}
% ls runtime/local/plugins/parse-tika/ | grep -v '^parse-tika\.jar$' | grep -v plugin.xml | sort >/tmp/tika_jars.txt
% perl -lne 'print $1 if /<library name="((.+?)-[.0-9]*\.jar)"/' src/plugin/parse-tika/plugin.xml | sort | comm - /tmp/tika_jars.txt
apache-mime4j-core-0.7.2.jar
apache-mime4j-dom-0.7.2.jar
asm-debug-all-4.1.jar
aspectjrt-1.8.0.jar
bcmail-jdk15-1.45.jar
bcmail-jdk15on-1.52.jar
bcpkix-jdk15on-1.52.jar
bcprov-jdk15-1.45.jar
bcprov-jdk15on-1.52.jar
boilerpipe-1.1.0.jar
bzip2-0.9.1.jar
c3p0-0.9.1.1.jar
cdm-4.5.5.jar
commons-codec-1.6.jar
commons-codec-1.9.jar
commons-compress-1.8.1.jar
commons-compress-1.9.jar
commons-csv-1.0.jar
commons-httpclient-3.1.jar
commons-logging-1.1.1.jar
commons-logging-api-1.1.jar
commons-vfs2-2.0.jar
ehcache-core-2.6.2.jar
fontbox-1.8.8.jar
fontbox-1.8.9.jar
grib-4.5.5.jar
guava-10.0.1.jar
httpclient-4.2.6.jar
httpcore-4.2.5.jar
httpmime-4.2.6.jar
httpservices-4.5.5.jar
isoparser-1.0.2.jar
java-libpst-0.8.1.jar
jcip-annotations-1.0.jar
jcommander-1.35.jar
jdom-1.0.jar
jdom2-2.0.4.jar
jempbox-1.8.8.jar
jempbox-1.8.9.jar
jhighlight-1.0.2.jar
jhighlight-1.0.jar
jj2000-5.2.jar
jmatio-1.0.jar
jna-4.1.0.jar
joda-time-2.2.jar
jsoup-1.7.2.jar
jsr305-1.3.9.jar
juniversalchardet-1.0.3.jar
junrar-0.7.jar
maven-scm-api-1.4.jar
maven-scm-provider-svn-commons-1.4.jar
maven-scm-provider-svnexe-1.4.jar
metadata-extractor-2.6.2.jar
metadata-extractor-2.8.0.jar
netcdf-4.2.20.jar
netcdf4-4.5.5.jar
pdfbox-1.8.8.jar
pdfbox-1.8.9.jar
plexus-utils-1.5.6.jar
poi-3.11.jar
poi-3.12-beta1.jar
poi-ooxml-3.11.jar
poi-ooxml-3.12-beta1.jar
poi-ooxml-schemas-3.11.jar
poi-ooxml-schemas-3.12-beta1.jar
poi-scratchpad-3.11.jar
poi-scratchpad-3.12-beta1.jar
protobuf-java-2.5.0.jar
quartz-2.2.0.jar
regexp-1.3.jar
rome-0.9.jar
slf4j-api-1.6.1.jar
slf4j-api-1.7.12.jar
sqlite-jdbc-3.8.6.jar
tagsoup-1.2.1.jar
tika-parsers-1.7.jar
tika-parsers-1.8.jar
udunits-4.5.5.jar
unidataCommon-4.2.20.jar
vorbis-java-core-0.6.jar
vorbis-java-tika-0.6.jar
xercesImpl-2.8.1.jar
xml-apis-1.3.03.jar
xmlbeans-2.6.0.jar
xmpcore-5.1.2.jar
xz-1.5.jar
{noformat}
> parse-tika: fix dependencies in plugin.xml
> ------------------------------------------
>
> Key: NUTCH-2048
> URL: https://issues.apache.org/jira/browse/NUTCH-2048
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.10
> Reporter: Sebastian Nagel
> Priority: Trivial
> Fix For: 1.11
>
> Attachments: NUTCH-2048_Joyce_20150723.patch
>
>
> Duplicate library dependencies listed in parse-tika's plugin.xml should be cleaned up. There are a duplicates, only the version differs, e.g.:
> {noformat}
> tika-parsers-1.7.jar
> tika-parsers-1.8.jar
> {noformat}
> Not critical because libs which are not present should be just ignored.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)