You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Klink (Jira)" <ji...@apache.org> on 2020/10/29 15:30:00 UTC

[jira] [Commented] (PDFBOX-5006) java.io.IOException: Error: End-of-File, expected line during PDDocument.load

    [ https://issues.apache.org/jira/browse/PDFBOX-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222960#comment-17222960 ] 

Michael Klink commented on PDFBOX-5006:
---------------------------------------

{quote}but I can open them using other pdf viewer (like chrome pdf viewer for example)
{quote}
Please be aware that PDF viewers or even PDF editors with GUIs usually are very lax concerning the document validity.

PDF libraries on the other hand must not be as lax because of the missing human in control.

For example, if you have an invalid PDF which because of some error would render as rubbish on the PDF recipient's computer, a GUI PDF editor may still open and show it (doing some repairs under the hood) because the user working with the editor can (actually *must*, it's part of his job) recognize the rubbish, stop processing the file and request an undamaged file at the document source; thus, the final PDF recipient does not get to see this rubbish. A PDF library in some fully automatized workflow, though, cannot assume that there is an instance that verifies that in spite of some defects the PDF displays as desired. Thus, it has to do its best to prevent that the final PDF recipient will get to see rubbish. And doing its best here can only mean refusing to process broken PDFs.

IMO PDFBox already now repairs too many errors under the hood.

----

That all being said, though: I downloaded those files and did not encounter any issues in opening them with PDFBox. Are you sure your download of those files actually succeeded?

> java.io.IOException: Error: End-of-File, expected line during PDDocument.load
> -----------------------------------------------------------------------------
>
>                 Key: PDFBOX-5006
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5006
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.20, 2.0.21
>         Environment: Debian, MacOs, open JDK 12
>            Reporter: Nicolas M
>            Priority: Major
>
> I got an I/O Exception when I try to open some PDF using the lib (calling PDDocument.load(pdfFile)). Here are some urls with affected PDF (I think it's the same problem for all of them) :
>  * [https://www.buerger.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf]
>  * [http://www.geislerfarms.com/documents/filelibrary/Geisler_COVID_statement_0A7A094E1EFB7.pdf]
>  * [http://www.sahealth.sa.gov.au/wps/wcm/connect/c736e1d5-932e-4f8a-8e56-52ab10a214fd/SALHN+Governing+Board+Minutes+-+5+March+2020.pdf?MOD=AJPERES&CACHEID=ROOTWORKSPACE-c736e1d5-932e-4f8a-8e56-52ab10a214fd-niR9I3J]
> I think the files are not well formatted and doesn't respect PDF specs but I can open them using other pdf viewer (like chrome pdf viewer for example)
>  
> Here is the stack trace : 
> {code:java}
> java.io.IOException: Error: End-of-File, expected linejava.io.IOException: Error: End-of-File, expected line at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1098) at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2581) at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2560) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1099) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1082) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1041) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:989)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org