You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/05/16 12:32:59 UTC
[jira] [Comment Edited] (PDFBOX-1756) ClassCastException CosString
cannot be cast to COSName
[ https://issues.apache.org/jira/browse/PDFBOX-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998856#comment-13998856 ]
Tim Allison edited comment on PDFBOX-1756 at 5/15/14 4:00 PM:
--------------------------------------------------------------
Shareable test document from TIKA-1252. Same issue.
ClassCastException also now happens on initial loading/parsing. This is caught and logged, and upon a quick review, it looks like text is being succesffuly extracted.
{noformat}
WARN [main] (COSDocument.java:302) - java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
at org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:294)
at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:627)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1224)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1189)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:118)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
{noformat}
was (Author: tallison@mitre.org):
Shareable test document from TIKA-1252. Same issue.
> ClassCastException CosString cannot be cast to COSName
> ------------------------------------------------------
>
> Key: PDFBOX-1756
> URL: https://issues.apache.org/jira/browse/PDFBOX-1756
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.8.2
> Environment: Ubuntu Linux & Windows 7 (both JDK6)
> Reporter: William Palmer
> Priority: Minor
> Attachments: testPDF_twoAuthors.pdf
>
>
> Opening and saving a PDF causes this exception in 1.8.2:
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
> at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:507)
> at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:435)
> at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1122)
> at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:552)
> at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1501)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1324)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305)
> The PDF is here: http://digitalcorpora.org/corp/nps/files/govdocs1/008/008677.pdf
> Code to reproduce the exception:
> PDFParser parser = new PDFParser(new FileInputStream(new File("008677.pdf")));
> parser.parse();
> File temp = File.createTempFile("temp-", ".pdf");
> parser.getPDDocument().save(temp);
> parser.getDocument().close();
--
This message was sent by Atlassian JIRA
(v6.2#6252)