You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Jira)" <ji...@apache.org> on 2020/01/09 07:18:00 UTC

[jira] [Closed] (PDFBOX-4736) java.io.IOException: Error: End-of-File, expected line

     [ https://issues.apache.org/jira/browse/PDFBOX-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler closed PDFBOX-4736.
--------------------------------------
    Resolution: Invalid

I've downloaded the pdf manually and it can be rendered without any problems using PDFBox.

The website requires javascript so that it won't work to simply open a http-connection to download the file. This is not an issue with PDFBox but with your code or better your expectation. You have to download the file manually or use an URL for a direct download.


> java.io.IOException: Error: End-of-File, expected line
> ------------------------------------------------------
>
>                 Key: PDFBOX-4736
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4736
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.18
>         Environment: Windows
>            Reporter: Akos Kovacs
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>
> I try to read PDF file from a given URL, but I got following error message:
> {code:java}
> Exception in thread "main" java.io.IOException: Error: End-of-File, expected lineException in thread "main" java.io.IOException: Error: End-of-File, expected line at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1124) at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2595) at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2574) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122) at ScreenshotFromPdf.Pdf2Image(ScreenshotFromPdf.java:19) at ScreenshotFromPdf.main(ScreenshotFromPdf.java:33){code}
> Example pdf file: [http://aplaidshirt.epizy.com/samplePDF.pdf]
> Code:
> {code:java}
> public class ScreenshotFromPdf {
>  public static void Pdf2Image(String html) throws IOException, InterruptedException {
>  Thread.sleep(5000);
>  URL url=new URL(html);
>  HttpURLConnection connection=(HttpURLConnection)url.openConnection();
>  InputStream is=connection.getInputStream();
>  PDDocument document = PDDocument.load(is);
>  PDFRenderer pdfRenderer = new PDFRenderer(document);
>  for (int page = 0; page < document.getNumberOfPages(); ++page) {
>  BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
>  File outputFile = new File("C:\\_privat\\pdftest\\" + page + "image.jpg");
>  System.out.println(outputFile.toString());
>  ImageIO.write(bim, "jpg", outputFile);
>  }
>  document.close();
>  }
>  public static void main(String[] args) throws IOException, InterruptedException {
>  String url = "http://aplaidshirt.epizy.com/samplePDF.pdf";
>  ScreenshotFromPdf.Pdf2Image(url);
>  }
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org