You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2021/09/08 18:09:00 UTC

[jira] [Comment Edited] (PDFBOX-5278) PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to fail

    [ https://issues.apache.org/jira/browse/PDFBOX-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412115#comment-17412115 ] 

Tilman Hausherr edited comment on PDFBOX-5278 at 9/8/21, 6:08 PM:
------------------------------------------------------------------

[~mkl] Your suspicion is correct :-)

[~alistairo] Your files are a mess. For example, the "64" file has a annotation 31 on page 7 referencing not an annotation directory, but page 1. Look for "43 0 R" and "43 0 obj" with notepad++ in that file.


was (Author: tilman):
[~mkl] Your suspicion is correct :-)

> PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to fail
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5278
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5278
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.24
>            Reporter: Alistair Oldfield
>            Priority: Major
>
> I have stumbled across a strange issue with a certain PDF where PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to fail.
>  
> I am not at liberty to share the PDF publicly, but am happy to DM the PDF privately if it helps.
>  
> The code to reproduce is pretty straightforward:
>  
>  
> {code:java}
> import java.io.File;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> public class AnnotationsTest {
> 	
> 	public static void main(String[] args) throws Exception {
> 		
> 		
> 		try( PDDocument doc = PDDocument.load(new File(args[0]));){
> 			for (PDPage page : doc.getPages()) {
> 				//this line will cause the doc to not be re-iterable in the next block, commenting it out will allow it to pass.
> 				page.getAnnotations();
> 			}
> 			
> 			System.out.println("We get here, no problem - not sure why we can't re-iterate again...");
> 			
> 			//doc.getPages() fails.
> 			for (PDPage page : doc.getPages()) {
> 				//do something
> 				
> 			}
> 		} 
> 	}
> {code}
>  The Exception:
>  
> Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot}Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot} at org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:266) at org.apache.pdfbox.pdmodel.PDPageTree.access$400(PDPageTree.java:43) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:224) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:172) at AnnotationsTest.main(AnnotationsTest.java:28)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org