You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Andreas Hubold <an...@coremedia.com> on 2021/03/31 08:14:55 UTC

Tika 1.26 update of jaxb-runtime to Jakarta EE 9

Hi,

from version 1.25 to 1.26, tika-parsers' dependency 
org.glassfish.jaxb:jaxb-runtime:jar was updated from 2.3.3 to 3.0.0.

The new version is part of Jakarta EE 9 and uses the Java package 
org.glassfish.jaxb, while version 2.3.3 (Jakarta EE 8) used package 
com.sun.xml.bind.
Its transitive dependency jakarta.xml.bind:jakarta.xml.bind-api was also 
updated from 2.3.3 to 3.0.0, which changed the package from 
javax.xml.bind to jakarta.xml.bin.
(And it also pulls in jakarta.activation from Jakarta EE 9 now)

This now causes some problems in a project that already uses Jakarta EE 
8 bind-api, version 2.3.3.

I found some background info on these versions here: 
https://www.eclipse.org/community/eclipse_newsletter/2020/november/1.php

 > mixing Jakarta EE 8 and Jakarta EE 9 APIs will cause issues with 
Maven because they both use the same Maven coordinates.

For such a case, they propose to use the old Java EE 8 artifacts with 
different Maven coordinates instead of Jakarta EE 8 artifacts. Maybe I 
could do that, but it can get very complicated in our project, because 
these artifacts are again transitive dependencies of other libraries.

This lead me to the questions:

1) Was this change intended, or did you just increase the version as 
part of general updates?

2) Which parsers need this dependency? We've excluded some parsers, and 
might be able to simply exclude the dependency as well. I haven't found 
any direct usage of this dependency in Tika. Maybe it was just added for 
version management, and is used transitively by some parser?

Thanks and regards,
Andreas





Re: Tika 1.26 update of jaxb-runtime to Jakarta EE 9

Posted by Thomas Mortagne <th...@gmail.com>.
I would add that there is a surprising dependency on
javax.xml.bind:jaxb-api 2.3.1 in tika-parent pom but only when
building with Java 9+ (did not notice it before since XWiki is built
with Java 8).

> If we need to fix this and respin a 1.26.1, I'm happy to do so.

I don't think we use those specific parsers, so we are probably ok on
our side (now I guess it's theoretically possible to hit those parses
depending on the kind of file people are going to attach since we
don't really limit it) for now but if your plan is to go back to jabx
2.x dependency anyway would probably be better to do it ASAP before
people adapt their project based on this change :)

On Wed, Mar 31, 2021 at 3:59 PM Tim Allison <ta...@apache.org> wrote:
>
> I should add that the safest way to use Tika is to isolate it in its
> own jvm, with tika-batch, the ForkParser or tika-server.  This
> prevents jar hell and will keep Tika from crashing your application in
> rare cases where things go wrong [1].
>
> That said, if this requires a respin of 1.26.1, I'm happy to do so.
>
> Thank you, again.
>
> Best,
>
>       Tim
>
> [1] https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika
>
> On Wed, Mar 31, 2021 at 9:52 AM Tim Allison <ta...@apache.org> wrote:
> >
> > Sorry about that and thank you for raising this issue.  I upgraded the
> > version 28 days ago as part of general upgrades (TIKA-3244).
> >
> > This dependency is only required for org.apache.cxf:cxf-rt-rs-client
> > and org.apache.cxf:cxf-rt-frontend-jaxrs.
> >
> > In branch_1x (and 1.26), we use the cxf client in:
> >
> > tika-parsers:
> >    GrobidRestParser which is used by the JournalParser
> >    NLTKNERecogniser
> >    TensorflowRESTVideoRecogniser
> >
> > tika-langdetect:
> >     Lingo24LangDetector
> >     TextLangDetector
> >
> >
> > tika-server:
> >   cxf frontend
> >
> > If you don't use those parsers or those langdetectors, you'll be ok
> > excluding that dependency.  If we need to fix this and respin a
> > 1.26.1, I'm happy to do so.
> >
> > Again, I'm sorry for the surprise, and thank you for the notification.
> >
> > Best,
> >
> >      Tim
> >
> > On Wed, Mar 31, 2021 at 4:15 AM Andreas Hubold
> > <an...@coremedia.com> wrote:
> > >
> > > Hi,
> > >
> > > from version 1.25 to 1.26, tika-parsers' dependency
> > > org.glassfish.jaxb:jaxb-runtime:jar was updated from 2.3.3 to 3.0.0.
> > >
> > > The new version is part of Jakarta EE 9 and uses the Java package
> > > org.glassfish.jaxb, while version 2.3.3 (Jakarta EE 8) used package
> > > com.sun.xml.bind.
> > > Its transitive dependency jakarta.xml.bind:jakarta.xml.bind-api was also
> > > updated from 2.3.3 to 3.0.0, which changed the package from
> > > javax.xml.bind to jakarta.xml.bin.
> > > (And it also pulls in jakarta.activation from Jakarta EE 9 now)
> > >
> > > This now causes some problems in a project that already uses Jakarta EE
> > > 8 bind-api, version 2.3.3.
> > >
> > > I found some background info on these versions here:
> > > https://www.eclipse.org/community/eclipse_newsletter/2020/november/1.php
> > >
> > >  > mixing Jakarta EE 8 and Jakarta EE 9 APIs will cause issues with
> > > Maven because they both use the same Maven coordinates.
> > >
> > > For such a case, they propose to use the old Java EE 8 artifacts with
> > > different Maven coordinates instead of Jakarta EE 8 artifacts. Maybe I
> > > could do that, but it can get very complicated in our project, because
> > > these artifacts are again transitive dependencies of other libraries.
> > >
> > > This lead me to the questions:
> > >
> > > 1) Was this change intended, or did you just increase the version as
> > > part of general updates?
> > >
> > > 2) Which parsers need this dependency? We've excluded some parsers, and
> > > might be able to simply exclude the dependency as well. I haven't found
> > > any direct usage of this dependency in Tika. Maybe it was just added for
> > > version management, and is used transitively by some parser?
> > >
> > > Thanks and regards,
> > > Andreas
> > >
> > >
> > >
> > >



-- 
Thomas

Re: Tika 1.26 update of jaxb-runtime to Jakarta EE 9

Posted by Tim Allison <ta...@apache.org>.
I should add that the safest way to use Tika is to isolate it in its
own jvm, with tika-batch, the ForkParser or tika-server.  This
prevents jar hell and will keep Tika from crashing your application in
rare cases where things go wrong [1].

That said, if this requires a respin of 1.26.1, I'm happy to do so.

Thank you, again.

Best,

      Tim

[1] https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika

On Wed, Mar 31, 2021 at 9:52 AM Tim Allison <ta...@apache.org> wrote:
>
> Sorry about that and thank you for raising this issue.  I upgraded the
> version 28 days ago as part of general upgrades (TIKA-3244).
>
> This dependency is only required for org.apache.cxf:cxf-rt-rs-client
> and org.apache.cxf:cxf-rt-frontend-jaxrs.
>
> In branch_1x (and 1.26), we use the cxf client in:
>
> tika-parsers:
>    GrobidRestParser which is used by the JournalParser
>    NLTKNERecogniser
>    TensorflowRESTVideoRecogniser
>
> tika-langdetect:
>     Lingo24LangDetector
>     TextLangDetector
>
>
> tika-server:
>   cxf frontend
>
> If you don't use those parsers or those langdetectors, you'll be ok
> excluding that dependency.  If we need to fix this and respin a
> 1.26.1, I'm happy to do so.
>
> Again, I'm sorry for the surprise, and thank you for the notification.
>
> Best,
>
>      Tim
>
> On Wed, Mar 31, 2021 at 4:15 AM Andreas Hubold
> <an...@coremedia.com> wrote:
> >
> > Hi,
> >
> > from version 1.25 to 1.26, tika-parsers' dependency
> > org.glassfish.jaxb:jaxb-runtime:jar was updated from 2.3.3 to 3.0.0.
> >
> > The new version is part of Jakarta EE 9 and uses the Java package
> > org.glassfish.jaxb, while version 2.3.3 (Jakarta EE 8) used package
> > com.sun.xml.bind.
> > Its transitive dependency jakarta.xml.bind:jakarta.xml.bind-api was also
> > updated from 2.3.3 to 3.0.0, which changed the package from
> > javax.xml.bind to jakarta.xml.bin.
> > (And it also pulls in jakarta.activation from Jakarta EE 9 now)
> >
> > This now causes some problems in a project that already uses Jakarta EE
> > 8 bind-api, version 2.3.3.
> >
> > I found some background info on these versions here:
> > https://www.eclipse.org/community/eclipse_newsletter/2020/november/1.php
> >
> >  > mixing Jakarta EE 8 and Jakarta EE 9 APIs will cause issues with
> > Maven because they both use the same Maven coordinates.
> >
> > For such a case, they propose to use the old Java EE 8 artifacts with
> > different Maven coordinates instead of Jakarta EE 8 artifacts. Maybe I
> > could do that, but it can get very complicated in our project, because
> > these artifacts are again transitive dependencies of other libraries.
> >
> > This lead me to the questions:
> >
> > 1) Was this change intended, or did you just increase the version as
> > part of general updates?
> >
> > 2) Which parsers need this dependency? We've excluded some parsers, and
> > might be able to simply exclude the dependency as well. I haven't found
> > any direct usage of this dependency in Tika. Maybe it was just added for
> > version management, and is used transitively by some parser?
> >
> > Thanks and regards,
> > Andreas
> >
> >
> >
> >

Re: Tika 1.26 update of jaxb-runtime to Jakarta EE 9

Posted by Andreas Hubold <an...@coremedia.com>.
Hi Tim,

thank you, we're not using those parsers. We've even excluded cxf 
dependencies already. Excluding the Jakarta EE 9 dependencies is a good 
workaround for us now.

An immediate fix release 1.26.1 is not necessary for us. (It might of 
course still make sense for other users.)

Best,
Andreas

Tim Allison wrote on 31.03.21 15:52:
> Sorry about that and thank you for raising this issue.  I upgraded the
> version 28 days ago as part of general upgrades (TIKA-3244).
>
> This dependency is only required for org.apache.cxf:cxf-rt-rs-client
> and org.apache.cxf:cxf-rt-frontend-jaxrs.
>
> In branch_1x (and 1.26), we use the cxf client in:
>
> tika-parsers:
>     GrobidRestParser which is used by the JournalParser
>     NLTKNERecogniser
>     TensorflowRESTVideoRecogniser
>
> tika-langdetect:
>      Lingo24LangDetector
>      TextLangDetector
>
>
> tika-server:
>    cxf frontend
>
> If you don't use those parsers or those langdetectors, you'll be ok
> excluding that dependency.  If we need to fix this and respin a
> 1.26.1, I'm happy to do so.
>
> Again, I'm sorry for the surprise, and thank you for the notification.
>
> Best,
>
>       Tim
>
> On Wed, Mar 31, 2021 at 4:15 AM Andreas Hubold
> <an...@coremedia.com> wrote:
>> Hi,
>>
>> from version 1.25 to 1.26, tika-parsers' dependency
>> org.glassfish.jaxb:jaxb-runtime:jar was updated from 2.3.3 to 3.0.0.
>>
>> The new version is part of Jakarta EE 9 and uses the Java package
>> org.glassfish.jaxb, while version 2.3.3 (Jakarta EE 8) used package
>> com.sun.xml.bind.
>> Its transitive dependency jakarta.xml.bind:jakarta.xml.bind-api was also
>> updated from 2.3.3 to 3.0.0, which changed the package from
>> javax.xml.bind to jakarta.xml.bin.
>> (And it also pulls in jakarta.activation from Jakarta EE 9 now)
>>
>> This now causes some problems in a project that already uses Jakarta EE
>> 8 bind-api, version 2.3.3.
>>
>> I found some background info on these versions here:
>> https://www.eclipse.org/community/eclipse_newsletter/2020/november/1.php
>>
>>   > mixing Jakarta EE 8 and Jakarta EE 9 APIs will cause issues with
>> Maven because they both use the same Maven coordinates.
>>
>> For such a case, they propose to use the old Java EE 8 artifacts with
>> different Maven coordinates instead of Jakarta EE 8 artifacts. Maybe I
>> could do that, but it can get very complicated in our project, because
>> these artifacts are again transitive dependencies of other libraries.
>>
>> This lead me to the questions:
>>
>> 1) Was this change intended, or did you just increase the version as
>> part of general updates?
>>
>> 2) Which parsers need this dependency? We've excluded some parsers, and
>> might be able to simply exclude the dependency as well. I haven't found
>> any direct usage of this dependency in Tika. Maybe it was just added for
>> version management, and is used transitively by some parser?
>>
>> Thanks and regards,
>> Andreas
>>
>>
>>
>>
> .



Re: Tika 1.26 update of jaxb-runtime to Jakarta EE 9

Posted by Tim Allison <ta...@apache.org>.
Sorry about that and thank you for raising this issue.  I upgraded the
version 28 days ago as part of general upgrades (TIKA-3244).

This dependency is only required for org.apache.cxf:cxf-rt-rs-client
and org.apache.cxf:cxf-rt-frontend-jaxrs.

In branch_1x (and 1.26), we use the cxf client in:

tika-parsers:
   GrobidRestParser which is used by the JournalParser
   NLTKNERecogniser
   TensorflowRESTVideoRecogniser

tika-langdetect:
    Lingo24LangDetector
    TextLangDetector


tika-server:
  cxf frontend

If you don't use those parsers or those langdetectors, you'll be ok
excluding that dependency.  If we need to fix this and respin a
1.26.1, I'm happy to do so.

Again, I'm sorry for the surprise, and thank you for the notification.

Best,

     Tim

On Wed, Mar 31, 2021 at 4:15 AM Andreas Hubold
<an...@coremedia.com> wrote:
>
> Hi,
>
> from version 1.25 to 1.26, tika-parsers' dependency
> org.glassfish.jaxb:jaxb-runtime:jar was updated from 2.3.3 to 3.0.0.
>
> The new version is part of Jakarta EE 9 and uses the Java package
> org.glassfish.jaxb, while version 2.3.3 (Jakarta EE 8) used package
> com.sun.xml.bind.
> Its transitive dependency jakarta.xml.bind:jakarta.xml.bind-api was also
> updated from 2.3.3 to 3.0.0, which changed the package from
> javax.xml.bind to jakarta.xml.bin.
> (And it also pulls in jakarta.activation from Jakarta EE 9 now)
>
> This now causes some problems in a project that already uses Jakarta EE
> 8 bind-api, version 2.3.3.
>
> I found some background info on these versions here:
> https://www.eclipse.org/community/eclipse_newsletter/2020/november/1.php
>
>  > mixing Jakarta EE 8 and Jakarta EE 9 APIs will cause issues with
> Maven because they both use the same Maven coordinates.
>
> For such a case, they propose to use the old Java EE 8 artifacts with
> different Maven coordinates instead of Jakarta EE 8 artifacts. Maybe I
> could do that, but it can get very complicated in our project, because
> these artifacts are again transitive dependencies of other libraries.
>
> This lead me to the questions:
>
> 1) Was this change intended, or did you just increase the version as
> part of general updates?
>
> 2) Which parsers need this dependency? We've excluded some parsers, and
> might be able to simply exclude the dependency as well. I haven't found
> any direct usage of this dependency in Tika. Maybe it was just added for
> version management, and is used transitively by some parser?
>
> Thanks and regards,
> Andreas
>
>
>
>

Re: Tika 1.26 update of jaxb-runtime to Jakarta EE 9

Posted by Thomas Mortagne <th...@gmail.com>.
I was about to send pretty much the same mail, so I would love to see
answers to Andreas's questions :)

Note that I excluded the bind-api and jaxb-runtime dependencies and
did not notice any issues in the tests on various files we have in
XWiki. But those tests definitely don't try all parsers, so I'm not
very comfortable releasing this without more information.

On Wed, Mar 31, 2021 at 10:15 AM Andreas Hubold
<an...@coremedia.com> wrote:
>
> Hi,
>
> from version 1.25 to 1.26, tika-parsers' dependency
> org.glassfish.jaxb:jaxb-runtime:jar was updated from 2.3.3 to 3.0.0.
>
> The new version is part of Jakarta EE 9 and uses the Java package
> org.glassfish.jaxb, while version 2.3.3 (Jakarta EE 8) used package
> com.sun.xml.bind.
> Its transitive dependency jakarta.xml.bind:jakarta.xml.bind-api was also
> updated from 2.3.3 to 3.0.0, which changed the package from
> javax.xml.bind to jakarta.xml.bin.
> (And it also pulls in jakarta.activation from Jakarta EE 9 now)
>
> This now causes some problems in a project that already uses Jakarta EE
> 8 bind-api, version 2.3.3.
>
> I found some background info on these versions here:
> https://www.eclipse.org/community/eclipse_newsletter/2020/november/1.php
>
>  > mixing Jakarta EE 8 and Jakarta EE 9 APIs will cause issues with
> Maven because they both use the same Maven coordinates.
>
> For such a case, they propose to use the old Java EE 8 artifacts with
> different Maven coordinates instead of Jakarta EE 8 artifacts. Maybe I
> could do that, but it can get very complicated in our project, because
> these artifacts are again transitive dependencies of other libraries.
>
> This lead me to the questions:
>
> 1) Was this change intended, or did you just increase the version as
> part of general updates?
>
> 2) Which parsers need this dependency? We've excluded some parsers, and
> might be able to simply exclude the dependency as well. I haven't found
> any direct usage of this dependency in Tika. Maybe it was just added for
> version management, and is used transitively by some parser?
>
> Thanks and regards,
> Andreas
>
>
>
>


-- 
Thomas