You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Sebastian Nagel <wa...@googlemail.com> on 2012/08/09 23:38:41 UTC

duplicate jar files by plugin dependencies

Hi,

I just discovered that some jar files
in the bin package (1.5.1) and also in nutch.job
are packed twice:

2   commons-logging-1.1.1.jar             lib parse-tika
2   geronimo-stax-api_1.0_spec-1.0.1.jar  lib parse-tika
2   tagsoup-1.2.1.jar                     parse-html parse-tika
2   tika-core-1.2.jar                     lib parse-tika
2   xercesImpl-2.9.1.jar                  lib lib-xml
1   commons-codec-1.4.jar                 lib
1   commons-codec-1.5.jar                 parse-tika
1   commons-net-1.2.2.jar                 protocol-ftp
1   commons-net-1.4.1.jar                 lib
1   jdom-1.0.jar                          parse-tika
1   jdom-1.1.jar                          lib-xml
1   jetty-util5-6.1.22.jar                lib
1   jetty-util-6.1.26.jar                 lib
1   rome-0.9.jar                          parse-tika
1   rome-1.0.0.jar                        feed
1   servlet-api-2.5-20081211.jar          lib
1   servlet-api-2.5-6.1.14.jar            lib
1   slf4j-api-1.5.6.jar                   parse-tika
1   slf4j-api-1.6.1.jar                   lib

Is it possible to avoid the duplicates?
At least, for same versions?

Is my understanding right? :
All jars in lib/ are in classpath even when a plugin is loaded.
So classes from tika-core*.jar are always loaded from lib/tika-core*.jar
and plugins/parse-tika/tika-core*.jar is never used.

Sebastian

Re: duplicate jar files by plugin dependencies

Posted by Julien Nioche <li...@gmail.com>.
+1 to using the maven-dependency-plugin within our ANT script. I think I
had put a preliminary version for 1.x in JIRA but we'd need to extend the
mechanism to the plugins as well.

On 10 August 2012 10:37, Lewis John Mcgibbney <le...@gmail.com>wrote:

> Hi Seb,
>
> On Thu, Aug 9, 2012 at 10:38 PM, Sebastian Nagel
> <wa...@googlemail.com> wrote:
> > Hi,
> >
> > I just discovered that some jar files
> > in the bin package (1.5.1) and also in nutch.job
> > are packed twice:
>
> OK so currently there is an open issue for something (similar) to
> this. I think the ticket is open for 2.x (I cannot confirm as Jira is
> temporarily down being migrated to dedicated slave). You raise a very
> valid point though. I would like to ask the following
>
> 1) Can we confirm that some classes are always loaded from /lib as
> oppose to plugins/parse-tika/tika-core.jar for example
> 2) My feeling is that this cannot always be the case. We have some
> plugins e.g. automaton and parse-swf where there is a dedicated jar
> file available in plugins/regex-automaton/lib for example.
> 3) Another problem as you highlight is that we have duplicate versions
> of vaious jar's which are pulled as transitive dependencies when we
> fetch deps with Ivy. To resolve this we need to open a dependency
> deduplication session and sort this out. I am doing this over at Gora
> atm and once I have an accurate and reasonable way to to do
> (maven-dependency-plugin?) then I will try a similar approach with
> Nutch. This should also address the open Jira issue.
>
> Lewis
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: duplicate jar files by plugin dependencies

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Seb,

On Thu, Aug 9, 2012 at 10:38 PM, Sebastian Nagel
<wa...@googlemail.com> wrote:
> Hi,
>
> I just discovered that some jar files
> in the bin package (1.5.1) and also in nutch.job
> are packed twice:

OK so currently there is an open issue for something (similar) to
this. I think the ticket is open for 2.x (I cannot confirm as Jira is
temporarily down being migrated to dedicated slave). You raise a very
valid point though. I would like to ask the following

1) Can we confirm that some classes are always loaded from /lib as
oppose to plugins/parse-tika/tika-core.jar for example
2) My feeling is that this cannot always be the case. We have some
plugins e.g. automaton and parse-swf where there is a dedicated jar
file available in plugins/regex-automaton/lib for example.
3) Another problem as you highlight is that we have duplicate versions
of vaious jar's which are pulled as transitive dependencies when we
fetch deps with Ivy. To resolve this we need to open a dependency
deduplication session and sort this out. I am doing this over at Gora
atm and once I have an accurate and reasonable way to to do
(maven-dependency-plugin?) then I will try a similar approach with
Nutch. This should also address the open Jira issue.

Lewis