You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@tika.apache.org by Markus Jelsma <ma...@openindex.io> on 2020/03/02 14:44:23 UTC

Unable to parse PDF due to NoSuchFieldError: HAS_XMP

Hello,

I recently upgraded to the latest Tika and am no longer able to parse PDF, at least the 6 files i just tested, due to:

Caused by: java.lang.NoSuchFieldError: HAS_XMP
        at org.apache.tika.parser.pdf.PDMetadataExtractor.extract(PDMetadataExtractor.java:60)
        at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:227)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:147)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

Trying to work-around the problem i upgraded PDFBox from 2.0.17 to 2.0.19, but this did not help.

There are no other PDFBox libraries anywhere on the classpath.

Any suggestions?

Many thanks,
Markus

Re: Unable to parse PDF due to NoSuchFieldError: HAS_XMP

Posted by Tim Allison <ta...@apache.org>.

Y, that's a Tika field.  Is there a chance that your tika-parser's version
does not match your tika-core version?  Which versions of each are you
using?

If this is a problem with Tika, we'll have time to fix it before the 1.24
release...coming soon...

Cheers,

            Tim

On Mon, Mar 2, 2020 at 9:44 AM Markus Jelsma <ma...@openindex.io>
wrote:

> Hello,
>
> I recently upgraded to the latest Tika and am no longer able to parse PDF,
> at least the 6 files i just tested, due to:
>
> Caused by: java.lang.NoSuchFieldError: HAS_XMP
>         at
> org.apache.tika.parser.pdf.PDMetadataExtractor.extract(PDMetadataExtractor.java:60)
>         at
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:227)
>         at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:147)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
> Trying to work-around the problem i upgraded PDFBox from 2.0.17 to 2.0.19,
> but this did not help.
>
> There are no other PDFBox libraries anywhere on the classpath.
>
> Any suggestions?
>
> Many thanks,
> Markus
>