You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Jira)" <ji...@apache.org> on 2020/07/31 06:02:00 UTC

[jira] [Comment Edited] (PDFBOX-4927) IllegalStateException: Expected 'Page' but found COSName{Annot} in PDPageTree.sanitizeType

    [ https://issues.apache.org/jira/browse/PDFBOX-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168408#comment-17168408 ] 

Andreas Lehmkühler edited comment on PDFBOX-4927 at 7/31/20, 6:01 AM:
----------------------------------------------------------------------

According to the PDF spec the offsets within an object stream shall be in ascending order but obviously we can't rely on that. Due to the sequential parsing of object streams we need those offsets in ascending order otherwise the objects get mixed up. I've added a TreeMap to sort the offsets to ensure the needed ordering.


was (Author: lehmi):
According to the PDF spec the offsets within an object stream shall be in ascending order but obviously we can't rely on that. Due to the sequential parsing we need those offsets in ascending order otherwise the objects get mixed up. I've added a TreeMap to sort the offsets to ensure the needed ordering

> IllegalStateException: Expected 'Page' but found COSName{Annot} in PDPageTree.sanitizeType
> ------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.20
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>              Labels: regression
>             Fix For: 2.0.21
>
>         Attachments: 3DDNDTVSP354Z72MXOJKUXVDNN7LFCPY.pdf, FYN6FCSAV5PI2WI5I5MSUOU3UJGOYUE4.pdf, M2BEIK4UWALXYEIHCRCJFASR222EEOLV.pdf
>
>
> {noformat}
> Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' but found COSName{Annot}
>         at org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:250)
>         at org.apache.pdfbox.pdmodel.PDPageTree.access$300(PDPageTree.java:41)
>         at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:210)
>         at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:170)
>         at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:320)
>         at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:272)
>         at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:377)
>         at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:274)
>         at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:97)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) {noformat}
> File works in 2.0.20 and in the trunk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org