You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Nicolas M (JIRA)" <ji...@apache.org> on 2017/11/20 16:32:00 UTC

[jira] [Created] (PDFBOX-4019) Expected 'Page' but found COSName{Font} in PDPageTree

Nicolas M created PDFBOX-4019:
---------------------------------

             Summary: Expected 'Page' but found COSName{Font} in PDPageTree
                 Key: PDFBOX-4019
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4019
             Project: PDFBox
          Issue Type: Improvement
          Components: PDModel, Text extraction
    Affects Versions: 2.0.8
         Environment: Debian 9 / MacOs (not OS related)
            Reporter: Nicolas M
         Attachments: Sterlite Technologies.pdf

Hello,

I have a PDF document that produces the following stack trace :

{code:java}
INFO: OpenType Layout tables used in font FreeSans are not implemented in PDFBox and will be ignored
Exception in thread "Thread-1" java.lang.IllegalStateException: Expected 'Page' but found COSName{Font}
	at org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:227)
	at org.apache.pdfbox.pdmodel.PDPageTree.access$300(PDPageTree.java:38)
	at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:189)
	at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:153)
	at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:314)
	at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
	at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
{code}

I found a similar problem here https://mail-archives.apache.org/mod_mbox/pdfbox-users/201610.mbox/%3C2e858989-2fb9-d000-5320-b644fcc71f81@t-online.de%3E

So, I understand that the problem comes from the pdf itself but given that some readers recover from it, is there any plan to add some recovery methods in PdfBox too?

Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org