You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Alistair Oldfield (Jira)" <ji...@apache.org> on 2021/09/07 19:57:00 UTC

[jira] [Created] (PDFBOX-5278) PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to fail

Alistair Oldfield created PDFBOX-5278:
-----------------------------------------

             Summary: PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to fail
                 Key: PDFBOX-5278
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5278
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 2.0.24
            Reporter: Alistair Oldfield


I have stumbled across a strange issue with a certain PDF where PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to fail.

 

I am not at liberty to share the PDF publicly, but am happy to DM the PDF privately if it helps.

 

The code to reproduce is pretty straightforward:

 

 
{code:java}
import java.io.File;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;


public class AnnotationsTest {

	
	public static void main(String[] args) throws Exception {

		
		

		try( PDDocument doc = PDDocument.load(new File(args[0]));){

			for (PDPage page : doc.getPages()) {
				//this line will cause the doc to not be re-iterable in the next block, commenting it out will allow it to pass.
				page.getAnnotations();

			}
			
			System.out.println("We get here, no problem - not sure why we can't re-iterate again...");
			
			//doc.getPages() fails.
			for (PDPage page : doc.getPages()) {
				//do something
				
			}

		} 
	}

{code}
 The Exception:

 

Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot}Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot} at org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:266) at org.apache.pdfbox.pdmodel.PDPageTree.access$400(PDPageTree.java:43) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:224) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:172) at com.onlinedoctranslator.test.AnnotationsTest.main(AnnotationsTest.java:28)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org