You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Patrick Davila Kochan (Jira)" <ji...@apache.org> on 2022/07/22 09:03:00 UTC

[jira] [Created] (PDFBOX-5480) PDDocument.load thows IOException in PDF

Patrick Davila Kochan created PDFBOX-5480:
---------------------------------------------

             Summary: PDDocument.load thows IOException in PDF
                 Key: PDFBOX-5480
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5480
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing, PDModel
    Affects Versions: 2.0.26, 2.0.25
         Environment: Ubuntu 20.04.4 LTS
Java OpenJDK 11.0.12-open
            Reporter: Patrick Davila Kochan
         Attachments: example.pdf

I use the PDDocument in my application and noticed that the load method throws an IOException (Error: End-of-File, expected line) with certain PDF files like the one in the attachment.

 

My code:

 
{code:java}
protected List<String> getLocalPages(final Resource completeEditionResource, final Edition edition, final int firstPage) throws Exception {
        PDDocument document = null;
        try {
            final InputStream in = completeEditionResource.getInputStream();
            document = PDDocument.load(in, MemoryUsageSetting.setupTempFileOnly());
        }
        PdfUtils.disableImageCache(document);
        return splitAndSavePages(document, firstPage, completeEditionResource, edition.getPublishedDate());
        } finally {
            if (document != null) {
                document.close();
            }
            completeEditionResource.getInputStream().reset();
        }
}{code}
 

Exception thrown:

 
{code:java}
java.io.IOException: Error: End-of-File, expected line
    at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1107)
    at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2650)
    at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2633)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1230)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1148)
    at com.flip.CompletePdfAnalyzer.getLocalPages(CompletePdfAnalyzer.java:162){code}
 

 

I successfully downloaded the PDF using FileUtils.copyInputStreamToFile from Apache Commons-IO just before PDDocument.load to verify that the inputStream was correct.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org