You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Patrick Davila Kochan (Jira)" <ji...@apache.org> on 2022/07/22 09:03:00 UTC
[jira] [Created] (PDFBOX-5480) PDDocument.load thows IOException in PDF
Patrick Davila Kochan created PDFBOX-5480:
---------------------------------------------
Summary: PDDocument.load thows IOException in PDF
Key: PDFBOX-5480
URL: https://issues.apache.org/jira/browse/PDFBOX-5480
Project: PDFBox
Issue Type: Bug
Components: Parsing, PDModel
Affects Versions: 2.0.26, 2.0.25
Environment: Ubuntu 20.04.4 LTS
Java OpenJDK 11.0.12-open
Reporter: Patrick Davila Kochan
Attachments: example.pdf
I use the PDDocument in my application and noticed that the load method throws an IOException (Error: End-of-File, expected line) with certain PDF files like the one in the attachment.
My code:
{code:java}
protected List<String> getLocalPages(final Resource completeEditionResource, final Edition edition, final int firstPage) throws Exception {
PDDocument document = null;
try {
final InputStream in = completeEditionResource.getInputStream();
document = PDDocument.load(in, MemoryUsageSetting.setupTempFileOnly());
}
PdfUtils.disableImageCache(document);
return splitAndSavePages(document, firstPage, completeEditionResource, edition.getPublishedDate());
} finally {
if (document != null) {
document.close();
}
completeEditionResource.getInputStream().reset();
}
}{code}
Exception thrown:
{code:java}
java.io.IOException: Error: End-of-File, expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1107)
at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2650)
at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2633)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1230)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1148)
at com.flip.CompletePdfAnalyzer.getLocalPages(CompletePdfAnalyzer.java:162){code}
I successfully downloaded the PDF using FileUtils.copyInputStreamToFile from Apache Commons-IO just before PDDocument.load to verify that the inputStream was correct.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org