You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Jira)" <ji...@apache.org> on 2022/07/22 14:43:00 UTC
[jira] [Commented] (PDFBOX-5480) PDDocument.load thows IOException in PDF

    [ https://issues.apache.org/jira/browse/PDFBOX-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570062#comment-17570062 ] 

Andreas Lehmkühler commented on PDFBOX-5480:
--------------------------------------------

The attached file works like a charme. Maybe you overlooked some issue with the input stream? Maybe it wasn't complete?


> PDDocument.load thows IOException in PDF
> ----------------------------------------
>
>                 Key: PDFBOX-5480
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5480
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 2.0.25, 2.0.26
>         Environment: Ubuntu 20.04.4 LTS
> Java OpenJDK 11.0.12-open
>            Reporter: Patrick Davila Kochan
>            Priority: Major
>         Attachments: example.pdf
>
>
> I use the PDDocument in my application and noticed that the load method throws an IOException (Error: End-of-File, expected line) with certain PDF files like the one in the attachment.
>  
> My code:
>  
> {code:java}
> protected List<String> getLocalPages(final Resource completeEditionResource, final Edition edition, final int firstPage) throws Exception {
>         PDDocument document = null;
>         try {
>             final InputStream in = completeEditionResource.getInputStream();
>             document = PDDocument.load(in, MemoryUsageSetting.setupTempFileOnly());
>         }
>         PdfUtils.disableImageCache(document);
>         return splitAndSavePages(document, firstPage, completeEditionResource, edition.getPublishedDate());
>         } finally {
>             if (document != null) {
>                 document.close();
>             }
>             completeEditionResource.getInputStream().reset();
>         }
> }{code}
>  
> Exception thrown:
>  
> {code:java}
> java.io.IOException: Error: End-of-File, expected line
>     at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1107)
>     at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2650)
>     at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2633)
>     at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
>     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1230)
>     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1148)
>     at com.flip.CompletePdfAnalyzer.getLocalPages(CompletePdfAnalyzer.java:162){code}
>  
>  
> I successfully downloaded the PDF using FileUtils.copyInputStreamToFile from Apache Commons-IO just before PDDocument.load to verify that the inputStream was correct.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org