You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Jira)" <ji...@apache.org> on 2022/07/22 14:43:00 UTC
[jira] [Commented] (PDFBOX-5480) PDDocument.load thows IOException in PDF
[ https://issues.apache.org/jira/browse/PDFBOX-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570062#comment-17570062 ]
Andreas Lehmkühler commented on PDFBOX-5480:
--------------------------------------------
The attached file works like a charme. Maybe you overlooked some issue with the input stream? Maybe it wasn't complete?
> PDDocument.load thows IOException in PDF
> ----------------------------------------
>
> Key: PDFBOX-5480
> URL: https://issues.apache.org/jira/browse/PDFBOX-5480
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 2.0.25, 2.0.26
> Environment: Ubuntu 20.04.4 LTS
> Java OpenJDK 11.0.12-open
> Reporter: Patrick Davila Kochan
> Priority: Major
> Attachments: example.pdf
>
>
> I use the PDDocument in my application and noticed that the load method throws an IOException (Error: End-of-File, expected line) with certain PDF files like the one in the attachment.
>
> My code:
>
> {code:java}
> protected List<String> getLocalPages(final Resource completeEditionResource, final Edition edition, final int firstPage) throws Exception {
> PDDocument document = null;
> try {
> final InputStream in = completeEditionResource.getInputStream();
> document = PDDocument.load(in, MemoryUsageSetting.setupTempFileOnly());
> }
> PdfUtils.disableImageCache(document);
> return splitAndSavePages(document, firstPage, completeEditionResource, edition.getPublishedDate());
> } finally {
> if (document != null) {
> document.close();
> }
> completeEditionResource.getInputStream().reset();
> }
> }{code}
>
> Exception thrown:
>
> {code:java}
> java.io.IOException: Error: End-of-File, expected line
> at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1107)
> at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2650)
> at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2633)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1230)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1148)
> at com.flip.CompletePdfAnalyzer.getLocalPages(CompletePdfAnalyzer.java:162){code}
>
>
> I successfully downloaded the PDF using FileUtils.copyInputStreamToFile from Apache Commons-IO just before PDDocument.load to verify that the inputStream was correct.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org