You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Colm O hEigeartaigh <co...@apache.org> on 2021/09/30 09:30:26 UTC

CXF upgrade to Tika 2.x

Hi,

Apache CXF has a dependency on Tika. I'd like to upgrade from 1.27 to
2.x for the next major version of CXF.

I created an initial PR here: https://github.com/apache/cxf/pull/858/

However we have some failing tests that we would like your input on.
For example:

https://github.com/apache/cxf/blob/75fb6bb56d82f72771a9ee6ecab5d36168303f51/rt/rs/extensions/search/src/test/java/org/apache/cxf/jaxrs/ext/search/tika/TikaContentExtractorTest.java#L53

Is there anything obvious that changed here from 1.27 to 2.x that
would cause this to start failing?

Thanks,

Colm.

Re: CXF upgrade to Tika 2.x

Posted by Tim Allison <ta...@apache.org>.
I'm sorry for the pain.  Thank you for reaching out!

On Thu, Sep 30, 2021 at 12:09 PM Colm O hEigeartaigh
<co...@apache.org> wrote:
>
> Thanks, all tests are passing now and CXF will update to Tika 2.1.0+
> for CXF 3.5.0.
>
> Colm.
>
> On Thu, Sep 30, 2021 at 1:30 PM Tim Allison <ta...@apache.org> wrote:
> >
> > dcterms:modified
> >
> > https://issues.apache.org/jira/browse/TIKA-3560?focusedCommentId=17419334&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17419334
> >
> >
> > On Thu, Sep 30, 2021 at 8:11 AM Colm O hEigeartaigh <co...@apache.org>
> > wrote:
> >
> > > Thanks, changing from Author to "dc:creator" fixes some of the failing
> > > tests. The other failing thing is "modified" - what should I change
> > > this to?
> > >
> > > Colm.
> > >
> > > On Thu, Sep 30, 2021 at 11:56 AM Tim Allison <ta...@apache.org> wrote:
> > > >
> > > > Sorry, should have looked more closely at the unit test.  I think the
> > > > answer is that we removed duplicate metadata keys and now rely only on
> > > > the "standard" keys.  So, you should use "dc:creator" (top of my head,
> > > > waves hands).  See a discussion here:
> > > > https://issues.apache.org/jira/browse/TIKA-3560.
> > > >
> > > > On Thu, Sep 30, 2021 at 6:21 AM Colm O hEigeartaigh <co...@apache.org>
> > > wrote:
> > > > >
> > > > > Yes, I switched tika-parsers to tika-parsers-standard-package as part
> > > > > of the upgrade.
> > > > >
> > > > > Colm.
> > > > >
> > > > > On Thu, Sep 30, 2021 at 11:13 AM Tim Allison <ta...@apache.org>
> > > wrote:
> > > > > >
> > > > > > Are you pulling in tika-parsers-standard-package? That’s the new
> > > > > > tika-parsers.
> > > > > >
> > > > > > We factored out the scientific parsers and the SQLite parser from
> > > > > > tika-parsers and put those into their own packages.
> > > > > >
> > > > > > On Thu, Sep 30, 2021 at 5:30 AM Colm O hEigeartaigh <
> > > coheigea@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Apache CXF has a dependency on Tika. I'd like to upgrade from 1.27
> > > to
> > > > > > > 2.x for the next major version of CXF.
> > > > > > >
> > > > > > > I created an initial PR here:
> > > https://github.com/apache/cxf/pull/858/
> > > > > > >
> > > > > > > However we have some failing tests that we would like your input
> > > on.
> > > > > > > For example:
> > > > > > >
> > > > > > >
> > > > > > >
> > > https://github.com/apache/cxf/blob/75fb6bb56d82f72771a9ee6ecab5d36168303f51/rt/rs/extensions/search/src/test/java/org/apache/cxf/jaxrs/ext/search/tika/TikaContentExtractorTest.java#L53
> > > > > > >
> > > > > > > Is there anything obvious that changed here from 1.27 to 2.x that
> > > > > > > would cause this to start failing?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Colm.
> > > > > > >
> > >

Re: CXF upgrade to Tika 2.x

Posted by Colm O hEigeartaigh <co...@apache.org>.
Thanks, all tests are passing now and CXF will update to Tika 2.1.0+
for CXF 3.5.0.

Colm.

On Thu, Sep 30, 2021 at 1:30 PM Tim Allison <ta...@apache.org> wrote:
>
> dcterms:modified
>
> https://issues.apache.org/jira/browse/TIKA-3560?focusedCommentId=17419334&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17419334
>
>
> On Thu, Sep 30, 2021 at 8:11 AM Colm O hEigeartaigh <co...@apache.org>
> wrote:
>
> > Thanks, changing from Author to "dc:creator" fixes some of the failing
> > tests. The other failing thing is "modified" - what should I change
> > this to?
> >
> > Colm.
> >
> > On Thu, Sep 30, 2021 at 11:56 AM Tim Allison <ta...@apache.org> wrote:
> > >
> > > Sorry, should have looked more closely at the unit test.  I think the
> > > answer is that we removed duplicate metadata keys and now rely only on
> > > the "standard" keys.  So, you should use "dc:creator" (top of my head,
> > > waves hands).  See a discussion here:
> > > https://issues.apache.org/jira/browse/TIKA-3560.
> > >
> > > On Thu, Sep 30, 2021 at 6:21 AM Colm O hEigeartaigh <co...@apache.org>
> > wrote:
> > > >
> > > > Yes, I switched tika-parsers to tika-parsers-standard-package as part
> > > > of the upgrade.
> > > >
> > > > Colm.
> > > >
> > > > On Thu, Sep 30, 2021 at 11:13 AM Tim Allison <ta...@apache.org>
> > wrote:
> > > > >
> > > > > Are you pulling in tika-parsers-standard-package? That’s the new
> > > > > tika-parsers.
> > > > >
> > > > > We factored out the scientific parsers and the SQLite parser from
> > > > > tika-parsers and put those into their own packages.
> > > > >
> > > > > On Thu, Sep 30, 2021 at 5:30 AM Colm O hEigeartaigh <
> > coheigea@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Apache CXF has a dependency on Tika. I'd like to upgrade from 1.27
> > to
> > > > > > 2.x for the next major version of CXF.
> > > > > >
> > > > > > I created an initial PR here:
> > https://github.com/apache/cxf/pull/858/
> > > > > >
> > > > > > However we have some failing tests that we would like your input
> > on.
> > > > > > For example:
> > > > > >
> > > > > >
> > > > > >
> > https://github.com/apache/cxf/blob/75fb6bb56d82f72771a9ee6ecab5d36168303f51/rt/rs/extensions/search/src/test/java/org/apache/cxf/jaxrs/ext/search/tika/TikaContentExtractorTest.java#L53
> > > > > >
> > > > > > Is there anything obvious that changed here from 1.27 to 2.x that
> > > > > > would cause this to start failing?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Colm.
> > > > > >
> >

Re: CXF upgrade to Tika 2.x

Posted by Tim Allison <ta...@apache.org>.
dcterms:modified

https://issues.apache.org/jira/browse/TIKA-3560?focusedCommentId=17419334&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17419334


On Thu, Sep 30, 2021 at 8:11 AM Colm O hEigeartaigh <co...@apache.org>
wrote:

> Thanks, changing from Author to "dc:creator" fixes some of the failing
> tests. The other failing thing is "modified" - what should I change
> this to?
>
> Colm.
>
> On Thu, Sep 30, 2021 at 11:56 AM Tim Allison <ta...@apache.org> wrote:
> >
> > Sorry, should have looked more closely at the unit test.  I think the
> > answer is that we removed duplicate metadata keys and now rely only on
> > the "standard" keys.  So, you should use "dc:creator" (top of my head,
> > waves hands).  See a discussion here:
> > https://issues.apache.org/jira/browse/TIKA-3560.
> >
> > On Thu, Sep 30, 2021 at 6:21 AM Colm O hEigeartaigh <co...@apache.org>
> wrote:
> > >
> > > Yes, I switched tika-parsers to tika-parsers-standard-package as part
> > > of the upgrade.
> > >
> > > Colm.
> > >
> > > On Thu, Sep 30, 2021 at 11:13 AM Tim Allison <ta...@apache.org>
> wrote:
> > > >
> > > > Are you pulling in tika-parsers-standard-package? That’s the new
> > > > tika-parsers.
> > > >
> > > > We factored out the scientific parsers and the SQLite parser from
> > > > tika-parsers and put those into their own packages.
> > > >
> > > > On Thu, Sep 30, 2021 at 5:30 AM Colm O hEigeartaigh <
> coheigea@apache.org>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Apache CXF has a dependency on Tika. I'd like to upgrade from 1.27
> to
> > > > > 2.x for the next major version of CXF.
> > > > >
> > > > > I created an initial PR here:
> https://github.com/apache/cxf/pull/858/
> > > > >
> > > > > However we have some failing tests that we would like your input
> on.
> > > > > For example:
> > > > >
> > > > >
> > > > >
> https://github.com/apache/cxf/blob/75fb6bb56d82f72771a9ee6ecab5d36168303f51/rt/rs/extensions/search/src/test/java/org/apache/cxf/jaxrs/ext/search/tika/TikaContentExtractorTest.java#L53
> > > > >
> > > > > Is there anything obvious that changed here from 1.27 to 2.x that
> > > > > would cause this to start failing?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Colm.
> > > > >
>

Re: CXF upgrade to Tika 2.x

Posted by Colm O hEigeartaigh <co...@apache.org>.
Thanks, changing from Author to "dc:creator" fixes some of the failing
tests. The other failing thing is "modified" - what should I change
this to?

Colm.

On Thu, Sep 30, 2021 at 11:56 AM Tim Allison <ta...@apache.org> wrote:
>
> Sorry, should have looked more closely at the unit test.  I think the
> answer is that we removed duplicate metadata keys and now rely only on
> the "standard" keys.  So, you should use "dc:creator" (top of my head,
> waves hands).  See a discussion here:
> https://issues.apache.org/jira/browse/TIKA-3560.
>
> On Thu, Sep 30, 2021 at 6:21 AM Colm O hEigeartaigh <co...@apache.org> wrote:
> >
> > Yes, I switched tika-parsers to tika-parsers-standard-package as part
> > of the upgrade.
> >
> > Colm.
> >
> > On Thu, Sep 30, 2021 at 11:13 AM Tim Allison <ta...@apache.org> wrote:
> > >
> > > Are you pulling in tika-parsers-standard-package? That’s the new
> > > tika-parsers.
> > >
> > > We factored out the scientific parsers and the SQLite parser from
> > > tika-parsers and put those into their own packages.
> > >
> > > On Thu, Sep 30, 2021 at 5:30 AM Colm O hEigeartaigh <co...@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Apache CXF has a dependency on Tika. I'd like to upgrade from 1.27 to
> > > > 2.x for the next major version of CXF.
> > > >
> > > > I created an initial PR here: https://github.com/apache/cxf/pull/858/
> > > >
> > > > However we have some failing tests that we would like your input on.
> > > > For example:
> > > >
> > > >
> > > > https://github.com/apache/cxf/blob/75fb6bb56d82f72771a9ee6ecab5d36168303f51/rt/rs/extensions/search/src/test/java/org/apache/cxf/jaxrs/ext/search/tika/TikaContentExtractorTest.java#L53
> > > >
> > > > Is there anything obvious that changed here from 1.27 to 2.x that
> > > > would cause this to start failing?
> > > >
> > > > Thanks,
> > > >
> > > > Colm.
> > > >

Re: CXF upgrade to Tika 2.x

Posted by Tim Allison <ta...@apache.org>.
Sorry, should have looked more closely at the unit test.  I think the
answer is that we removed duplicate metadata keys and now rely only on
the "standard" keys.  So, you should use "dc:creator" (top of my head,
waves hands).  See a discussion here:
https://issues.apache.org/jira/browse/TIKA-3560.

On Thu, Sep 30, 2021 at 6:21 AM Colm O hEigeartaigh <co...@apache.org> wrote:
>
> Yes, I switched tika-parsers to tika-parsers-standard-package as part
> of the upgrade.
>
> Colm.
>
> On Thu, Sep 30, 2021 at 11:13 AM Tim Allison <ta...@apache.org> wrote:
> >
> > Are you pulling in tika-parsers-standard-package? That’s the new
> > tika-parsers.
> >
> > We factored out the scientific parsers and the SQLite parser from
> > tika-parsers and put those into their own packages.
> >
> > On Thu, Sep 30, 2021 at 5:30 AM Colm O hEigeartaigh <co...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > Apache CXF has a dependency on Tika. I'd like to upgrade from 1.27 to
> > > 2.x for the next major version of CXF.
> > >
> > > I created an initial PR here: https://github.com/apache/cxf/pull/858/
> > >
> > > However we have some failing tests that we would like your input on.
> > > For example:
> > >
> > >
> > > https://github.com/apache/cxf/blob/75fb6bb56d82f72771a9ee6ecab5d36168303f51/rt/rs/extensions/search/src/test/java/org/apache/cxf/jaxrs/ext/search/tika/TikaContentExtractorTest.java#L53
> > >
> > > Is there anything obvious that changed here from 1.27 to 2.x that
> > > would cause this to start failing?
> > >
> > > Thanks,
> > >
> > > Colm.
> > >

Re: CXF upgrade to Tika 2.x

Posted by Colm O hEigeartaigh <co...@apache.org>.
Yes, I switched tika-parsers to tika-parsers-standard-package as part
of the upgrade.

Colm.

On Thu, Sep 30, 2021 at 11:13 AM Tim Allison <ta...@apache.org> wrote:
>
> Are you pulling in tika-parsers-standard-package? That’s the new
> tika-parsers.
>
> We factored out the scientific parsers and the SQLite parser from
> tika-parsers and put those into their own packages.
>
> On Thu, Sep 30, 2021 at 5:30 AM Colm O hEigeartaigh <co...@apache.org>
> wrote:
>
> > Hi,
> >
> > Apache CXF has a dependency on Tika. I'd like to upgrade from 1.27 to
> > 2.x for the next major version of CXF.
> >
> > I created an initial PR here: https://github.com/apache/cxf/pull/858/
> >
> > However we have some failing tests that we would like your input on.
> > For example:
> >
> >
> > https://github.com/apache/cxf/blob/75fb6bb56d82f72771a9ee6ecab5d36168303f51/rt/rs/extensions/search/src/test/java/org/apache/cxf/jaxrs/ext/search/tika/TikaContentExtractorTest.java#L53
> >
> > Is there anything obvious that changed here from 1.27 to 2.x that
> > would cause this to start failing?
> >
> > Thanks,
> >
> > Colm.
> >

Re: CXF upgrade to Tika 2.x

Posted by Tim Allison <ta...@apache.org>.
Are you pulling in tika-parsers-standard-package? That’s the new
tika-parsers.

We factored out the scientific parsers and the SQLite parser from
tika-parsers and put those into their own packages.

On Thu, Sep 30, 2021 at 5:30 AM Colm O hEigeartaigh <co...@apache.org>
wrote:

> Hi,
>
> Apache CXF has a dependency on Tika. I'd like to upgrade from 1.27 to
> 2.x for the next major version of CXF.
>
> I created an initial PR here: https://github.com/apache/cxf/pull/858/
>
> However we have some failing tests that we would like your input on.
> For example:
>
>
> https://github.com/apache/cxf/blob/75fb6bb56d82f72771a9ee6ecab5d36168303f51/rt/rs/extensions/search/src/test/java/org/apache/cxf/jaxrs/ext/search/tika/TikaContentExtractorTest.java#L53
>
> Is there anything obvious that changed here from 1.27 to 2.x that
> would cause this to start failing?
>
> Thanks,
>
> Colm.
>