You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Markus Jelsma <ma...@openindex.io> on 2020/03/02 14:44:23 UTC
Unable to parse PDF due to NoSuchFieldError: HAS_XMP
Hello,
I recently upgraded to the latest Tika and am no longer able to parse PDF, at least the 6 files i just tested, due to:
Caused by: java.lang.NoSuchFieldError: HAS_XMP
at org.apache.tika.parser.pdf.PDMetadataExtractor.extract(PDMetadataExtractor.java:60)
at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:227)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:147)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
Trying to work-around the problem i upgraded PDFBox from 2.0.17 to 2.0.19, but this did not help.
There are no other PDFBox libraries anywhere on the classpath.
Any suggestions?
Many thanks,
Markus
Re: Unable to parse PDF due to NoSuchFieldError: HAS_XMP
Posted by Tim Allison <ta...@apache.org>.
Y, that's a Tika field. Is there a chance that your tika-parser's version
does not match your tika-core version? Which versions of each are you
using?
If this is a problem with Tika, we'll have time to fix it before the 1.24
release...coming soon...
Cheers,
Tim
On Mon, Mar 2, 2020 at 9:44 AM Markus Jelsma <ma...@openindex.io>
wrote:
> Hello,
>
> I recently upgraded to the latest Tika and am no longer able to parse PDF,
> at least the 6 files i just tested, due to:
>
> Caused by: java.lang.NoSuchFieldError: HAS_XMP
> at
> org.apache.tika.parser.pdf.PDMetadataExtractor.extract(PDMetadataExtractor.java:60)
> at
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:227)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:147)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
> Trying to work-around the problem i upgraded PDFBox from 2.0.17 to 2.0.19,
> but this did not help.
>
> There are no other PDFBox libraries anywhere on the classpath.
>
> Any suggestions?
>
> Many thanks,
> Markus
>