You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "William (Created) (JIRA)" <ji...@apache.org> on 2012/03/29 16:42:31 UTC
[jira] [Created] (PDFBOX-1273) java.io.IOException: Error: Unknown
annotation type null
java.io.IOException: Error: Unknown annotation type null
--------------------------------------------------------
Key: PDFBOX-1273
URL: https://issues.apache.org/jira/browse/PDFBOX-1273
Project: PDFBox
Issue Type: Bug
Components: PDModel
Affects Versions: 1.7.0
Reporter: William
Priority: Minor
Hi,
I've come across the following exception on a very small number of documents:
org.apache.tika.exception.TikaException: Unable to extract PDF content
at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:80) ~[extractor.jar:na]
at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:116) ~[extractor.jar:na]
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ~[extractor.jar:na]
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ~[extractor.jar:na]
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) ~[extractor.jar:na]
Caused by: java.io.IOException: Error: Unknown annotation type null
at org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation.createAnnotation(PDAnnotation.java:165) ~[extractor.jar:na]
at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:785) ~[extractor.jar:na]
at org.apache.pdfbox.tika.PDF2XHTML.endPage(PDF2XHTML.java:142) ~[extractor.jar:na]
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:450) ~[extractor.jar:na]
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372) ~[extractor.jar:na]
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328) ~[extractor.jar:na]
at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:63) ~[extractor.jar:na]
Here are a few examples:
http://www.jdsupra.com/documents/01ece854-a961-4184-8de7-f6d5311d6a48.pdf
http://www.jdsupra.com/documents/0aabecb4-094a-40e4-a507-8b49ecb90a3e.pdf
http://www.jdsupra.com/documents/0d74ccf8-2d57-487d-88c2-98eee26f8236.pdf
Thanks
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1273) java.io.IOException: Error: Unknown
annotation type null
Posted by "William (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
William updated PDFBOX-1273:
----------------------------
Attachment: PDPageQuickFix.patch
This fixes it, but it's probably not the best fix :-)
> java.io.IOException: Error: Unknown annotation type null
> --------------------------------------------------------
>
> Key: PDFBOX-1273
> URL: https://issues.apache.org/jira/browse/PDFBOX-1273
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 1.7.0
> Reporter: William
> Priority: Minor
> Attachments: PDPageQuickFix.patch
>
>
> Hi,
> I've come across the following exception on a very small number of documents:
> org.apache.tika.exception.TikaException: Unable to extract PDF content
> at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:80) ~[extractor.jar:na]
> at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:116) ~[extractor.jar:na]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ~[extractor.jar:na]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ~[extractor.jar:na]
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) ~[extractor.jar:na]
> Caused by: java.io.IOException: Error: Unknown annotation type null
> at org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation.createAnnotation(PDAnnotation.java:165) ~[extractor.jar:na]
> at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:785) ~[extractor.jar:na]
> at org.apache.pdfbox.tika.PDF2XHTML.endPage(PDF2XHTML.java:142) ~[extractor.jar:na]
> at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:450) ~[extractor.jar:na]
> at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372) ~[extractor.jar:na]
> at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328) ~[extractor.jar:na]
> at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:63) ~[extractor.jar:na]
> Here are a few examples:
> http://www.jdsupra.com/documents/01ece854-a961-4184-8de7-f6d5311d6a48.pdf
> http://www.jdsupra.com/documents/0aabecb4-094a-40e4-a507-8b49ecb90a3e.pdf
> http://www.jdsupra.com/documents/0d74ccf8-2d57-487d-88c2-98eee26f8236.pdf
> Thanks
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira