You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Luís Filipe Nassif <lf...@gmail.com> on 2022/07/12 17:09:05 UTC

Question about Tika-2.4.x configuration in project pom.xml

Hi all,

We recently upgraded to Tika-2.4.0 and followed the wiki 2.0 migration
guide. It recommends declaring e.g.:

<dependency>
      <groupId>org.apache.tika</groupId>
      <artifactId>tika-parsers-standard-package</artifactId>
      <version>2.4.1</version>
</dependency>

But when I look into my project dependencies deploy folder, I see many
module jars included (e.g. tika-parser-image-module-2.4.1.jar,
tika-parser-mail-module-2.4.1.jar, tika-parser-office-module-2.4.1.jar,
etc) but also tika-parsers-standard-package-2.4.1.jar and it
already includes most packages and classes from the modules jars,
duplicating many classes.

I'm not an expert in maven... Is that expected? Is there any configuration
to fix the dependency duplication above?

Thanks in advance,
Luís Nassif

Re: Question about Tika-2.4.x configuration in project pom.xml

Posted by Tim Allison <ta...@apache.org>.
Sorry, y, it isn't you.  It feels like this should "just work" in maven,
but I haven't had success.  Seriously, if anyone has recommendations for
how we can clean this up, it'd be much appreciated.

On Mon, Jul 18, 2022 at 11:49 AM Luís Filipe Nassif <lf...@gmail.com>
wrote:

> Thanks, Tim. At least now I know my config is not messing things :-)
>
> Em seg, 18 de jul de 2022 10:18, Tim Allison <ta...@apache.org>
> escreveu:
>
>> In tika-app and tika-server, we went to the hassle of excluding a bunch
>> of dependencies to avoid this.  If anyone has a cleaner solution, let's
>> implement it.  I'm not happy with this behavior in 2.x.
>>
>> On Mon, Jul 18, 2022 at 8:53 AM Luís Filipe Nassif <lf...@gmail.com>
>> wrote:
>>
>>> Any ideas?
>>>
>>> Em ter, 12 de jul de 2022 14:09, Luís Filipe Nassif <lf...@gmail.com>
>>> escreveu:
>>>
>>>> Hi all,
>>>>
>>>> We recently upgraded to Tika-2.4.0 and followed the wiki 2.0 migration
>>>> guide. It recommends declaring e.g.:
>>>>
>>>> <dependency>
>>>>       <groupId>org.apache.tika</groupId>
>>>>       <artifactId>tika-parsers-standard-package</artifactId>
>>>>       <version>2.4.1</version>
>>>> </dependency>
>>>>
>>>> But when I look into my project dependencies deploy folder, I see many
>>>> module jars included (e.g. tika-parser-image-module-2.4.1.jar,
>>>> tika-parser-mail-module-2.4.1.jar, tika-parser-office-module-2.4.1.jar,
>>>> etc) but also tika-parsers-standard-package-2.4.1.jar and it
>>>> already includes most packages and classes from the modules jars,
>>>> duplicating many classes.
>>>>
>>>> I'm not an expert in maven... Is that expected? Is there any
>>>> configuration to fix the dependency duplication above?
>>>>
>>>> Thanks in advance,
>>>> Luís Nassif
>>>>
>>>

Re: Question about Tika-2.4.x configuration in project pom.xml

Posted by Tim Allison <ta...@apache.org>.
Sorry, y, it isn't you.  It feels like this should "just work" in maven,
but I haven't had success.  Seriously, if anyone has recommendations for
how we can clean this up, it'd be much appreciated.

On Mon, Jul 18, 2022 at 11:49 AM Luís Filipe Nassif <lf...@gmail.com>
wrote:

> Thanks, Tim. At least now I know my config is not messing things :-)
>
> Em seg, 18 de jul de 2022 10:18, Tim Allison <ta...@apache.org>
> escreveu:
>
>> In tika-app and tika-server, we went to the hassle of excluding a bunch
>> of dependencies to avoid this.  If anyone has a cleaner solution, let's
>> implement it.  I'm not happy with this behavior in 2.x.
>>
>> On Mon, Jul 18, 2022 at 8:53 AM Luís Filipe Nassif <lf...@gmail.com>
>> wrote:
>>
>>> Any ideas?
>>>
>>> Em ter, 12 de jul de 2022 14:09, Luís Filipe Nassif <lf...@gmail.com>
>>> escreveu:
>>>
>>>> Hi all,
>>>>
>>>> We recently upgraded to Tika-2.4.0 and followed the wiki 2.0 migration
>>>> guide. It recommends declaring e.g.:
>>>>
>>>> <dependency>
>>>>       <groupId>org.apache.tika</groupId>
>>>>       <artifactId>tika-parsers-standard-package</artifactId>
>>>>       <version>2.4.1</version>
>>>> </dependency>
>>>>
>>>> But when I look into my project dependencies deploy folder, I see many
>>>> module jars included (e.g. tika-parser-image-module-2.4.1.jar,
>>>> tika-parser-mail-module-2.4.1.jar, tika-parser-office-module-2.4.1.jar,
>>>> etc) but also tika-parsers-standard-package-2.4.1.jar and it
>>>> already includes most packages and classes from the modules jars,
>>>> duplicating many classes.
>>>>
>>>> I'm not an expert in maven... Is that expected? Is there any
>>>> configuration to fix the dependency duplication above?
>>>>
>>>> Thanks in advance,
>>>> Luís Nassif
>>>>
>>>

Re: Question about Tika-2.4.x configuration in project pom.xml

Posted by Luís Filipe Nassif <lf...@gmail.com>.
Thanks, Tim. At least now I know my config is not messing things :-)

Em seg, 18 de jul de 2022 10:18, Tim Allison <ta...@apache.org> escreveu:

> In tika-app and tika-server, we went to the hassle of excluding a bunch of
> dependencies to avoid this.  If anyone has a cleaner solution, let's
> implement it.  I'm not happy with this behavior in 2.x.
>
> On Mon, Jul 18, 2022 at 8:53 AM Luís Filipe Nassif <lf...@gmail.com>
> wrote:
>
>> Any ideas?
>>
>> Em ter, 12 de jul de 2022 14:09, Luís Filipe Nassif <lf...@gmail.com>
>> escreveu:
>>
>>> Hi all,
>>>
>>> We recently upgraded to Tika-2.4.0 and followed the wiki 2.0 migration
>>> guide. It recommends declaring e.g.:
>>>
>>> <dependency>
>>>       <groupId>org.apache.tika</groupId>
>>>       <artifactId>tika-parsers-standard-package</artifactId>
>>>       <version>2.4.1</version>
>>> </dependency>
>>>
>>> But when I look into my project dependencies deploy folder, I see many
>>> module jars included (e.g. tika-parser-image-module-2.4.1.jar,
>>> tika-parser-mail-module-2.4.1.jar, tika-parser-office-module-2.4.1.jar,
>>> etc) but also tika-parsers-standard-package-2.4.1.jar and it
>>> already includes most packages and classes from the modules jars,
>>> duplicating many classes.
>>>
>>> I'm not an expert in maven... Is that expected? Is there any
>>> configuration to fix the dependency duplication above?
>>>
>>> Thanks in advance,
>>> Luís Nassif
>>>
>>

Re: Question about Tika-2.4.x configuration in project pom.xml

Posted by Tim Allison <ta...@apache.org>.
In tika-app and tika-server, we went to the hassle of excluding a bunch of
dependencies to avoid this.  If anyone has a cleaner solution, let's
implement it.  I'm not happy with this behavior in 2.x.

On Mon, Jul 18, 2022 at 8:53 AM Luís Filipe Nassif <lf...@gmail.com>
wrote:

> Any ideas?
>
> Em ter, 12 de jul de 2022 14:09, Luís Filipe Nassif <lf...@gmail.com>
> escreveu:
>
>> Hi all,
>>
>> We recently upgraded to Tika-2.4.0 and followed the wiki 2.0 migration
>> guide. It recommends declaring e.g.:
>>
>> <dependency>
>>       <groupId>org.apache.tika</groupId>
>>       <artifactId>tika-parsers-standard-package</artifactId>
>>       <version>2.4.1</version>
>> </dependency>
>>
>> But when I look into my project dependencies deploy folder, I see many
>> module jars included (e.g. tika-parser-image-module-2.4.1.jar,
>> tika-parser-mail-module-2.4.1.jar, tika-parser-office-module-2.4.1.jar,
>> etc) but also tika-parsers-standard-package-2.4.1.jar and it
>> already includes most packages and classes from the modules jars,
>> duplicating many classes.
>>
>> I'm not an expert in maven... Is that expected? Is there any
>> configuration to fix the dependency duplication above?
>>
>> Thanks in advance,
>> Luís Nassif
>>
>

Re: Question about Tika-2.4.x configuration in project pom.xml

Posted by Luís Filipe Nassif <lf...@gmail.com>.
Any ideas?

Em ter, 12 de jul de 2022 14:09, Luís Filipe Nassif <lf...@gmail.com>
escreveu:

> Hi all,
>
> We recently upgraded to Tika-2.4.0 and followed the wiki 2.0 migration
> guide. It recommends declaring e.g.:
>
> <dependency>
>       <groupId>org.apache.tika</groupId>
>       <artifactId>tika-parsers-standard-package</artifactId>
>       <version>2.4.1</version>
> </dependency>
>
> But when I look into my project dependencies deploy folder, I see many
> module jars included (e.g. tika-parser-image-module-2.4.1.jar,
> tika-parser-mail-module-2.4.1.jar, tika-parser-office-module-2.4.1.jar,
> etc) but also tika-parsers-standard-package-2.4.1.jar and it
> already includes most packages and classes from the modules jars,
> duplicating many classes.
>
> I'm not an expert in maven... Is that expected? Is there any configuration
> to fix the dependency duplication above?
>
> Thanks in advance,
> Luís Nassif
>