You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Stefan Wurzinger <St...@cloudflight.io> on 2020/12/01 16:10:43 UTC

PDDocument.load sometimes fails with IOException: Requested page with index 2 was not written before

Hi,

I’m randomly getting Errors while loading PDF documents using PDDocument.load() method. Unfortunately I couldn’t reliably reproduce it, I just see it happening sometimes. Usually when retrying in the same process (and on the same machine) it will fail again. When retrying later it usually just works.

The document files are large, about 2 to 3 GB in average.

The (virtual) machine where the process runs can consume up to 20 GB of memory.


The stacktrace and error message is always the same (but it occurs at different places where PDDocument.load() is called) and looks like this:

java.io.IOException: Requested page with index 2 was not written before.
                at org.apache.pdfbox.io.ScratchFile.readPage(ScratchFile.java:324)
                at org.apache.pdfbox.io.ScratchFileBuffer.ensureAvailableBytesInPage(ScratchFileBuffer.java:177)
                at org.apache.pdfbox.io.ScratchFileBuffer.read(ScratchFileBuffer.java:426)
                at org.apache.pdfbox.pdfparser.COSParser.isString(COSParser.java:2478)
                at org.apache.pdfbox.pdfparser.COSParser.bfSearchForLastEOFMarker(COSParser.java:1871)
                at org.apache.pdfbox.pdfparser.COSParser.bfSearchForObjects(COSParser.java:1556)
                at org.apache.pdfbox.pdfparser.COSParser.rebuildTrailer(COSParser.java:2196)
                at org.apache.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:281)
                at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:173)
                at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
                at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222)
                at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122)

(it’s always “index 2”)

Can anybody give me a hint why this error might occur? Could be be some (hidden) out-of-memory or out-of-disk-space issue? Could it be some PDFBox bug? Could it be some timing / caching / buffering issue? Or something else (what?)?

Thanks for any hint.

Best regards,
Stefan


Re: PDDocument.load sometimes fails with IOException: Requested page with index 2 was not written before

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,
What PDFBox version are you using?
Do you use multithreading?
Is the disk space limited?
What load() call parameters were you using (memory parameters?) 
Apparently, "load(InputStream input)".
Can you reproduce the effect by loading and closing the file in a loop?
Tilman


Am 01.12.2020 um 17:10 schrieb Stefan Wurzinger:
> Hi,
>
> I’m randomly getting Errors while loading PDF documents using PDDocument.load() method. Unfortunately I couldn’t reliably reproduce it, I just see it happening sometimes. Usually when retrying in the same process (and on the same machine) it will fail again. When retrying later it usually just works.
>
> The document files are large, about 2 to 3 GB in average.
>
> The (virtual) machine where the process runs can consume up to 20 GB of memory.
>
>
> The stacktrace and error message is always the same (but it occurs at different places where PDDocument.load() is called) and looks like this:
>
> java.io.IOException: Requested page with index 2 was not written before.
>                  at org.apache.pdfbox.io.ScratchFile.readPage(ScratchFile.java:324)
>                  at org.apache.pdfbox.io.ScratchFileBuffer.ensureAvailableBytesInPage(ScratchFileBuffer.java:177)
>                  at org.apache.pdfbox.io.ScratchFileBuffer.read(ScratchFileBuffer.java:426)
>                  at org.apache.pdfbox.pdfparser.COSParser.isString(COSParser.java:2478)
>                  at org.apache.pdfbox.pdfparser.COSParser.bfSearchForLastEOFMarker(COSParser.java:1871)
>                  at org.apache.pdfbox.pdfparser.COSParser.bfSearchForObjects(COSParser.java:1556)
>                  at org.apache.pdfbox.pdfparser.COSParser.rebuildTrailer(COSParser.java:2196)
>                  at org.apache.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:281)
>                  at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:173)
>                  at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>                  at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222)
>                  at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122)
>
> (it’s always “index 2”)
>
> Can anybody give me a hint why this error might occur? Could be be some (hidden) out-of-memory or out-of-disk-space issue? Could it be some PDFBox bug? Could it be some timing / caching / buffering issue? Or something else (what?)?
>
> Thanks for any hint.
>
> Best regards,
> Stefan
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org