You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/05/16 12:32:59 UTC

[jira] [Comment Edited] (PDFBOX-1756) ClassCastException CosString cannot be cast to COSName

    [ https://issues.apache.org/jira/browse/PDFBOX-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998856#comment-13998856 ] 

Tim Allison edited comment on PDFBOX-1756 at 5/15/14 4:00 PM:
--------------------------------------------------------------

Shareable test document from TIKA-1252.  Same issue.

ClassCastException also now happens on initial loading/parsing.  This is caught and logged, and upon a quick review, it looks like text is being succesffuly extracted.

{noformat}
 WARN [main] (COSDocument.java:302) - java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
	at org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:294)
	at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:627)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1224)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1189)
	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:118)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
{noformat}


was (Author: tallison@mitre.org):
Shareable test document from TIKA-1252.  Same issue.

> ClassCastException CosString cannot be cast to COSName
> ------------------------------------------------------
>
>                 Key: PDFBOX-1756
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1756
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.2
>         Environment: Ubuntu Linux & Windows 7 (both JDK6)
>            Reporter: William Palmer
>            Priority: Minor
>         Attachments: testPDF_twoAuthors.pdf
>
>
> Opening and saving a PDF causes this exception in 1.8.2:
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
> 	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:507)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:435)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1122)
> 	at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:552)
> 	at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1501)
> 	at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1324)
> 	at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305)
> The PDF is here: http://digitalcorpora.org/corp/nps/files/govdocs1/008/008677.pdf
> Code to reproduce the exception:
> PDFParser parser = new PDFParser(new FileInputStream(new File("008677.pdf")));
> parser.parse();
> File temp = File.createTempFile("temp-", ".pdf");
> parser.getPDDocument().save(temp);
> parser.getDocument().close();



--
This message was sent by Atlassian JIRA
(v6.2#6252)